## Comments on the Advanced Computing/Supercomputing IFR: "Developing technical solutions to exempt items otherwise classified under ECCNs 3A090 and 4A090"

**ID:** RIN 0694–AI94 | BIS–2022–0025

**Subject:** Implementation of Additional Export Controls: Certain Advanced Computing Items;

Supercomputer and Semiconductor End Use; Updates and Corrections

Date:

**Authors:** Onni Aarne Timothy Fist

Consultant, Institute for AI Fellow, Technology & Policy and Strategy National Security Program

Policy and Strategy National Security Program Technology & National Security Program Security Program

ni@iaps.ai Center for a New American Security Program
Security¹ Center for a New American

tfist@cnas.org Security

cwithers@cnas.org

Caleb Withers

Research Assistant,

#### About this document

This comment represents the views of the authors alone and not those of their employers. The authors commend the Bureau of Industry and Security (BIS) for the Advanced Computing/Supercomputing interim final rule (AC/S IFR). The AC/S IFR takes essential steps to make export controls in this domain more well-targeted and less burdensome on U.S. firms.

This document is a response to BIS' request for public comments on developing technical solutions to exempt items otherwise classified under ECCNs 3A090 and 4A090. The text of this document is drawn from a report recently released by the authors, which examines the role that "on-chip mechanisms" could play in supporting policy objectives related to supercomputing and broadly capable artificial intelligence, including limiting their usage for training large dual-use AI foundation models in the export control context. This submission collects information and independent analysis of the possibilities and limitations of such mechanisms. The submission hopes to complement industry proposals and analysis by providing additional, independent technical context and analysis of policy options. Sections especially relevant to BIS' request for public comment are Section 2, (particularly 2.1, 2.2 and 2.4), and Appendix C.

-

<sup>&</sup>lt;sup>1</sup> As a research and policy institution committed to the highest standards of organizational, intellectual, and personal integrity, CNAS maintains strict intellectual independence and sole editorial direction and control over its ideas, projects, publications, events, and other research activities. CNAS does not take institutional positions on policy issues and the content of CNAS publications reflects the views of their authors alone. In keeping with its mission and values, CNAS does not engage in lobbying activity and complies fully with all applicable federal, state, and local laws. CNAS will not engage in any representational activities or advocacy on behalf of any entities or interests and, to the extent that the Center accepts funding from non-U.S. sources, its activities will be limited to bona fide scholastic, academic, and research-related activities, consistent with applicable federal law. The Center publicly acknowledges on its website annually all donors who contribute.

<sup>&</sup>lt;sup>2</sup> Onni Arne, Timothy Fist, Caleb Withers, "Secure, Governable Chips," Center for a New American Security, January 2024, <a href="https://www.cnas.org/publications/reports/secure-governable-chips">https://www.cnas.org/publications/reports/secure-governable-chips</a>.

| Executive Summary                                                        | 3  |  |  |  |
|--------------------------------------------------------------------------|----|--|--|--|
| What Would Effective On-Chip Governance Look Like?                       | 8  |  |  |  |
| 2 Policies that On-Chip Governance Mechanisms Could Enable               |    |  |  |  |
| 2.1 Operating Licenses to Prevent Unauthorized Use                       | 11 |  |  |  |
| 2.2 Location Verification                                                | 11 |  |  |  |
| 2.3 Usage Verification                                                   | 13 |  |  |  |
| 2.4 Usage Limitations                                                    | 13 |  |  |  |
| 3 Technical Underpinnings                                                | 15 |  |  |  |
| 4 Challenges for Implementation                                          | 19 |  |  |  |
| 4.1 Privacy, Surveillance, and Cybersecurity Implications                | 22 |  |  |  |
| 4.2 Overview of Threat Models and Defenses                               | 23 |  |  |  |
| 5 Implementation Timelines                                               | 26 |  |  |  |
| 6 Recommendations                                                        | 28 |  |  |  |
| 7 Limitations and Conclusion                                             | 32 |  |  |  |
| Appendix A: Glossary for AI Compute                                      | 34 |  |  |  |
| Appendix B: Additional Security Considerations                           | 36 |  |  |  |
| Appendix C: Using On-Chip Mechanisms To Prevent Chip Smuggling           | 41 |  |  |  |
| How on-chip governance mechanisms could help address smuggling           | 42 |  |  |  |
| Location Monitoring                                                      | 42 |  |  |  |
| Operating Licenses                                                       | 42 |  |  |  |
| Policy Options                                                           | 42 |  |  |  |
| Approach 1: Relying on existing BIS authorities                          | 43 |  |  |  |
| Approach 2: New legislation                                              | 43 |  |  |  |
| Approach 3: Motivating better compliance via more aggressive enforcement | 43 |  |  |  |

## **Executive Summary**

On-chip governance mechanisms could help address two issues related to the current export controls:

- 1. The country-wide approach to controls has significant downsides. Export to China is restricted even for relatively harmless end uses. This risks harming the competitiveness of U.S. firms, risks the "de-Americanization" of chip supply chains, and risks alienating commercial AI developers and partner nations.
- 2. The current, far-reaching controls are likely to be difficult to enforce. AI chip smuggling is already happening today and is likely to significantly grow in volume over the coming years.<sup>3</sup>

This submission analyzes the possibility of addressing these issues using "on-chip governance mechanisms": secure physical mechanisms built directly into chips or associated hardware that could help deny their usage to unauthorized actors, among other policy goals. Its key conclusions are as follows.

On-chip governance mechanisms could help safeguard the development and deployment of broadly capable AI and supercomputing systems, in a way that is complementary to American technology leadership.

One especially promising near-term application is export control enforcement, where on-chip mechanisms could prevent or place boundaries around unauthorized actors' use of export-controlled AI chips. Implemented well, this would greatly aid enforcement, and reduce the need for top-down export controls that harm the competitiveness of the U.S. chip industry, instead enabling more surgical end-use/end-user-focused controls if desired. Certain on-chip governance mechanisms could also be very useful for more effectively preventing circumvention of existing controls via smuggling.

Much of the required functionality for on-chip governance is already widely deployed on various chips, including cutting-edge AI chips.

Chips sold by leading firms AMD, Apple, Intel, and NVIDIA have many of the features needed to enable the policies described above. These features are used today in a wide variety of applications. On the iPhone, on-chip mechanisms ensure that unauthorized applications can't be installed. Google uses on-chip mechanisms to remotely verify that chips running in their data centers have not been compromised. Many multiplayer video games now work with a hardware device called a "Trusted Platform Module" to prevent in-game cheating. In the AI space, these features are increasingly used to distribute training across different devices and users while preserving privacy of code and data.<sup>4</sup>

On-chip governance does not require secret monitoring of users or insecure "back doors" on hardware. On-chip governance is better implemented through privacy-preserving "verification" and "operating licenses" for AI chips used in data centers.

"Verification" involves the user of a chip making claims that are verifiable by another party about what they are doing with the chip. For example, verifying the quantity of computation or the dataset used in a particular training run.<sup>5</sup> Secure on-chip verification of this kind is made possible by a "Trusted Execution

<sup>&</sup>lt;sup>3</sup> Erich Grunewald and Michael Aird. "AI Chip Smuggling into China: Potential Paths, Quantities, and Countermeasures." Institute for AI Policy and Strategy, October 4, 2023. https://www.iaps.ai/research/ai-chip-smuggling-into-china.

<sup>&</sup>lt;sup>4</sup> Fan Mo, Zahra Tarkhani, and Hamed Haddadi, "Machine Learning with Confidential Computing: A Systematization of Knowledge," arXiv, April 2, 2023, http://arxiv.org/abs/2208.10134; Fan Mo et al., "PPFL: Privacy-Preserving Federated Learning with Trusted Execution Environments," arXiv, June 28, 2021, http://arxiv.org/abs/2104.14380; and Xiaoguo Li et al., "A Survey of Secure Computation Using Trusted Execution Environments," arXiv, February 23, 202), http://arxiv.org/abs/2302.12150.

<sup>&</sup>lt;sup>5</sup> For example, a recent White House executive order requires AI developers to report the development of models trained with "biological sequence data" above a certain computation threshold. Such regulations could evolve to require more formal verification of which dataset was used in training, especially if such regulation applied to foreign AI developers accessing U.S. compute via the cloud or U.S.-produced chips. The hardware security features described in this submission could enable this, perhaps using a "Proof of Training Data" protocol of the kind

Environment" (TEE). Because of the TEE's security properties, the verifier can trust that information received from the TEE has not been "spoofed," without the chip's user needing to divulge sensitive data.<sup>6</sup>

"Operating licenses" provide an enforcement mechanism. This is useful in cases where, for example, the chip's owner is found to have acquired the chip in violation of an export control agreement, or if the chip's user refuses to participate in a legally required verification process. Operating licenses would be best enabled using a dedicated "security module" that links the functioning of the chip to a periodically renewed license key from the manufacturer (or a regulator), not unlike the product licenses required to unlock proprietary software. Hardware operating licenses of this kind are already used in some commercial contexts.

These mechanisms should primarily be used on the specialized data center AI chips that are targeted by the current AI chip export controls. However, some limited mechanisms on consumer GPUs may be useful if, in the future, these devices are export-controlled.<sup>7</sup>

## Existing technologies need to be hardened before they can be relied upon in adversarial settings such as export control enforcement.

On-chip governance mechanisms are only useful insofar as they reliably work even when adversaries are actively attempting to circumvent them. <sup>8</sup> Commercial versions of these technologies are not typically designed to defend against a well-resourced attacker with physical access to the hardware. Investments in hardware and software security will be required for on-chip governance mechanisms to function reliably in these kinds of environments.

The specific defenses required to adequately secure on-chip governance mechanisms depend on the context in which they are deployed. This submission explores three contexts: minimally, covertly, and openly adversarial.

## A staged approach to the development and rollout of on-chip governance for data center AI chips is possible.

Intermediate stages of R&D could still be useful in production contexts. In the short term, firmware updates could be deployed to exported AI chips implementing early versions of a hardware operating license linked to the terms of an export license. This would be useful as an additional cautionary measure for already-planned AI chip exports to high-diversion-risk geographies.

A promising and relatively feasible next step would be to make devices "tamper-evident" (attempts to tamper with the chips would leave indelible evidence). This could be a sufficient level of security in cases where occasional physical inspections of the hardware are possible.

For subsequent generations of AI chips, hardware security features could be further hardened, working toward full "tamper-proofing" to make physical inspections less necessary.

<sup>6</sup> In the information security context, "spoofing" refers to the falsification of data by an attacker. See "Spoofing Attack," Wikipedia, <a href="https://en.wikipedia.org/w/index.php?title=Spoofing\_attack&oldid=1166570796">https://en.wikipedia.org/w/index.php?title=Spoofing\_attack&oldid=1166570796</a>.

"Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections.", Supplementary information section C.2, 88 Fed. Reg. 73458, October 25, 2023, <a href="https://www.federalregister.gov/d/2023-23055/p-204">https://www.federalregister.gov/d/2023-23055/p-204</a>.

described here: Dami Choi, Yonadav Shavit, and David Duvenaud. "Tools for Verifying Neural Models' Training Data," July 2, 2023, https://doi.org/10.48550/arXiv.2307.00682.

<sup>&</sup>lt;sup>7</sup> Ideally this would be avoided by chip firms further differentiating consumer and data center GPUs designs. However, the Commerce Department recently added a notification requirement for exports of consumer chips with AI-relevant capabilities, suggesting that some consumer GPUs may be export-controlled in the future.

<sup>&</sup>lt;sup>8</sup> Following cybersecurity conventions, this submission uses the term "adversary" to refer to anyone attempting to circumvent or compromise an on-chip mechanism. Thus, the adversary need not be an adversary in a broader sense and can instead be, e.g., a company attempting to evade regulations.

To motivate further investigation of on-chip governance, this submission sketches an example architecture for data center AI chips that could provide a flexible platform for dynamically implementing different governance mechanisms. The core of this proposal is a hardened security module, included on all high-performance data center AI chips, that can ensure that the chip has valid, up-to-date firmware and software and, where applicable, an up-to-date operating license. If these conditions are not met, it would block the chip from operating.

This valid, up-to-date firmware and software then could help enforce limits on the uses of these chips and offer sophisticated "remote attestation" capabilities (remote authentication to securely verify desired properties of the chip and the software it is running). The security module could ensure that if firmware/software vulnerabilities are found, users would have no choice but to update to patched versions where the vulnerability has been fixed. The security module also could be configured to require an up-to-date, chip-specific operating license.

Current AI chips already have some components of this architecture, but not all. These gaps likely could be closed with moderate development effort as extensions of functionality already in place. The primary technical challenge will be implementing adequate hardware security, particularly for tamper-evidence and tamper-proofing. This submission estimates this could be achieved with as little as 18 months of involved technical effort (and up to 4 years) from leading firms.

Because a small number of allied countries encompass the supply chain for the most advanced AI chips, only a small number of countries would need to coordinate to ensure that all cutting-edge AI chips have these mechanisms built in. On-chip mechanisms would need to be supported by a way to track the ownership of data center AI chips, and some form of inspections to ensure these chips are not tampered with, where required.

On-chip governance mechanisms present a promising area for further research for computer engineers, computer scientists, and policy researchers. This submission offers the following recommendations to U.S. policymakers to move toward a world where all leading AI chips are secure and governable.

## Establish government coordination

**Recommendation**: The White House should issue an executive order establishing a NIST-led interagency working group, focused on getting on-chip governance mechanisms built into all export-controlled data center AI chips.

**Background**: For on-chip governance to reach commercial scale, long-term collaboration between government and industry will be required. For progress to be made quickly, an executive order could be an appropriate forcing function. The National Institute of Standards and Technology (NIST) would make a suitable lead for this effort. Expertise and staff also should be drawn from the Department of Energy, the Department of Defense, the Department of Homeland Security, the National Science Foundation, and the U.S. intelligence community. The working group should also be informed by a technical panel drawn from industry and academia to help direct technical standards and research.

#### Create commercial incentives

**Recommendation**: The Department of Commerce (DoC) should incentivize U.S. chip designers to conduct necessary R&D using "advance export market commitments."

<sup>&</sup>lt;sup>9</sup>"Advance market commitments" (AMCs), a relatively new idea, describe binding contracts offered by a government to guarantee a viable market for a product once it has been successfully developed. AMCs have seen success in incentivizing the development of new vaccines: Federation of American Scientists, "Creating Advanced Market Commitments and Prizes for Pandemic Preparedness,"

https://fas.org/publication/creating-advanced-market-commitments-and-prizes-for-pandemic-preparedness/.

**Background**: Given that on-chip governance mechanisms need to be implemented on commercial chips, much of the necessary R&D will need to happen in an industry setting. To incentivize this work, the DoC should consider making commitments related to future access to export markets to U.S. chip firms, conditional on firms implementing a specific set of security features on controlled products. Such commitments would be an effective way of incentivizing the necessary R&D without spending public money, given the large amount of lost revenue to chip firms caused by export restrictions. Export market commitments could include not extending export controls to new jurisdictions, relaxing the "presumption of denial" licensing policy for chip exports to lower-risk customers in China, or moving toward more surgical end-use or end-user-based controls. The DoC should develop the required feature sets by analyzing specific attacker threat models in different export contexts, in coordination with the U.S. Intelligence Community and Department of Homeland Security.

## Accelerate security R&D

**Recommendation**: NIST should coordinate with industry and relevant government funding bodies to scope, fund, and support R&D that can be conducted outside leading chip companies and integrated later.

Background: While the large majority of R&D will need to be conducted by the firms building and selling AI chips at scale, some work may be usefully conducted outside of these firms, especially technologies that would benefit from being standardized across the industry. NIST should coordinate with the Semiconductor Research Corporation, relevant Defense Advanced Research Projects Agency (DARPA) program managers, and other relevant government funding bodies to scope and fund useful R&D to be performed by academic and/or commercial partners. For example, work on specialized tamper-proof enclosures (physical housings for chips that prevent the chip from being modified without compromising its operation) for high-end chips could be potentially outsourced to academic and commercial hardware security labs. To support these projects, NIST should create technical standards and reference implementations for on-chip governance mechanisms that are designed for wide adoption by industry.

## Plan for a staged roll-out and fund extensive red-teaming

**Recommendation:** To ensure that on-chip governance mechanisms are properly designed and safely introduced, the DoC and Department of Homeland Security (DHS) should establish flexible export licensing and red-teaming programs.

Background: On-chip mechanisms will require substantial testing before being relied upon in more adversarial environments (e.g., exports of controlled chips to China). To facilitate a staged rollout approach where mechanisms can be depended upon in successively more challenging operating contexts, the DoC should create export licensing arrangements where licenses can be flexibly granted for different geographies based on the security features on the device to be exported. In tandem, the Cybersecurity and Infrastructure Security Agency within DHS should establish red-teaming and bug bounty programs to help find and patch any software and hardware security vulnerabilities. A promising near-term starting point is setting up a public prize for finding vulnerabilities in hardware security features on today's AI chips.

#### Coordinate with allies

**Recommendation**: The State and Commerce Departments should coordinate with allies on policies and standards for on-chip governance.

<sup>&</sup>lt;sup>10</sup> Stephen Nellis and Jane Lee, "U.S. Officials Order Nvidia to Halt Sales of Top AI Chips to China," Reuters, September 1, 2022,

https://www.reuters.com/technology/nvidia-says-us-has-imposed-new-license-requirement-future-exports-china-20 22-08-31/.

**Background**: As with many other forms of technology governance, on-chip governance will be of limited effectiveness without international buy-in. The State and Commerce Departments should include the potential role of on-chip governance mechanisms in diplomatic discussions with countries that occupy important positions in the supply chain for cutting-edge AI chips (especially Taiwan, the Netherlands, South Korea, and Japan), including potential new multilateral control regimes. <sup>11</sup> Looking beyond export control coordination, using on-chip governance mechanisms to facilitate AI governance cooperation (e.g., international agreements on compute usage reporting) would benefit from close coordination with like-minded allies, such as the United Kingdom and the European Union.

## Encourage AI chip firms to move early

**Recommendation**: Chip firms should be encouraged to move early to build and harden the security features required for on-chip governance.

**Background**: The United States has signaled interest in on-chip governance in a recent request for comment issued by the Department of Commerce. <sup>12</sup> Chip suppliers that are more able to apply and build on existing technical efforts will have a head start on demonstrating and realizing compliance, with potential benefits in terms of access to markets that are the subject of export controls or other relevant regulation.

Developing and deploying the mechanisms described in this submission will take time (months in the most optimistic case, years in the most likely case). If the capabilities and national security risks of AI systems continue to grow at the pace observed in 2022 and 2023, the need for highly effective controls could become acute in several years. This suggests that policymakers concerned about this issue should begin formulating policies and incentivizing the development of appropriate technologies now. Once the relevant security features have been mandated in the most powerful AI chips, they need not be used immediately: The mechanisms outlined in this submission would allow for rapid and flexible responses to new developments and threats once installed.

-

<sup>&</sup>lt;sup>11</sup> Emily Benson and Catharine Mouradian. "Establishing a New Multilateral Export Control Regime," November 2, 2023, <a href="https://www.csis.org/analysis/establishing-new-multilateral-export-control-regime">https://www.csis.org/analysis/establishing-new-multilateral-export-control-regime</a>.

<sup>&</sup>lt;sup>12</sup> "Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections.", Supplementary information section D.2, *88 Fed. Reg. 73458*, October 25, 2023, <a href="https://www.federalregister.gov/d/2023-23055/p-350.">https://www.federalregister.gov/d/2023-23055/p-350.</a>

#### **Definitions**

This submission defines on-chip governance mechanisms as technical mechanisms that rely on hardware-level security features to:

- 1. Enable a controller to restrict what can be done with a hardware device; and/or
- 2. Enable a *verifier* to verify claims about the state or use of the hardware, based on having a high level of trust about the integrity of the security mechanisms.

The proximate controller almost always would be the hardware vendor, but the de facto controller could be, for example, a regulator who mandates that particularly powerful hardware should not be made available to unlawful actors.<sup>13</sup> In a future, more comprehensive AI governance regime, a regulator could be both a verifier and a controller: For example, they could require AI developers to verifiably report that they are going about their development safely, and impose restrictions on developers who cannot prove this.<sup>14</sup>

This submission also uses the terms "compute user" and "compute operator." The user is the entity that uses chips in an operational capacity (e.g., a company that trains AI models). The operator is the entity that owns, physically controls, and manages the computing hardware (e.g., a cloud service provider). In some cases, the same entity will be both the compute user and the compute operator. In other cases, these entities will be distinct. For specific definitions of other AI compute-related terms used in this submission, see Appendix A.

## 1 What Would Effective On-Chip Governance Look Like?

This section briefly lays out a sketch of a concrete vision for the set of on-chip mechanisms and associated measures that would allow for flexible compute governance. The core of this proposal is a hardened "security module," included on all high-performance data center AI chips, that can ensure that the chip has valid, up-to-date firmware and software and, where applicable, an up-to-date operating license. If these conditions were not met, the security module would prevent the chip from operating.

This valid, up-to-date firmware and software then could help enforce limits on the uses of these chips, and offer sophisticated "remote attestation" capabilities, or, in less technical terms, the ability for the chip to send trusted information about the chip and its usage to a third-party verifier. The security module also would ensure that if vulnerabilities are found in firmware and software, users would have no choice but to update to patched versions where the vulnerability has been fixed. Chip-specific operating licenses would allow export-controlled chips to be configured such that they could be remotely disabled by the manufacturer by ceasing to issue licenses for that chip. This would allow export controls to be enforced remotely if the terms of an export license had been violated. Chips also would have support for "trusted execution environments" that could, together with remote attestation capabilities, allow the chips to be used to make a wide range of "verifiable claims," such as the amount of compute used to train an AI model or other properties of the training process.

Implementing these features on AI chips provides a platform for *adaptive governance*. These features would allow for a wide range of policies (for example, a training compute reporting requirement above a certain

<sup>&</sup>lt;sup>13</sup> It also may be possible for the hardware vendor to hand direct control over the mechanism to another entity, such as a government agency.

<sup>&</sup>lt;sup>14</sup> This submission uses the term "AI developer" to refer to organizations or teams developing AI systems, not to individuals.

<sup>&</sup>lt;sup>15</sup> For example, if a company runs its own servers on its own premises, they are both operator and user. If a company is using a cloud provider, the cloud provider is the operator, and the company is the user.

threshold, as called for by the recent White House executive order, to be implemented and updated directly on the chip by simply deploying a firmware or software update. Many of the required security features are already common on CPUs and are being increasingly introduced on GPUs, such as NVIDIA's new H100. These likely could be implemented at an acceptable cost as an extension of existing standards for secure boot and remote attestation features.

These technical features ideally would be supported by robust supply chain tracking and "Know Your Customer" policies for AI chip exports/sales, which would allow the controller to know which chips are being used by which actors. This system of supply chain tracking also could include periodic monitoring and inspections to ensure that any novel attempts to physically tamper with chips can be caught.

With this overall sketch as a framework, the next section describes in more detail the specific policies that these technical features could unlock.

## Applications beyond AI

This submission uses the term "AI chips," and primarily highlights the benefits of on-chip mechanisms for addressing AI-related national security concerns (specifically compute-intensive broadly capable systems). But the advanced chips referenced also play an important role in non-AI applications, such as design and testing for aerospace systems and nuclear weapons. The measures discussed in this submission are highly relevant for these cases, and in general, for wherever advanced chips are used in national security—relevant applications.

## 2 Policies that On-Chip Governance Mechanisms Could Enable

At a high level, on-chip governance mechanisms could allow a regulator to take the following actions:

- 1. **Restriction**: Restricting access to, or "throttling" (reducing) the performance of a chip. Such measures also could include preventing the chip from being used as part of a large cluster/supercomputer.
- 2. **Verification**: Requiring the chip user to securely verify how they are using the chip (e.g., which specific code or data is being used in an AI training run).

Details on the technical underpinnings of these capabilities are included in Section 5, and Section 6 discusses their viability in adversarial contexts.

https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper.

<sup>&</sup>lt;sup>16</sup> "Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence," the White House, October 30, 2023.

 $<sup>\</sup>underline{\text{https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/.}$ 

<sup>&</sup>lt;sup>17</sup> "NVIDIA H100 Tensor Core GPU Architecture Overview," NVIDIA,

<sup>&</sup>lt;sup>18</sup> Bureau of Industry and Security, "Implementation of Additional Export Controls: Certain Advanced Computing and Semiconductor Manufacturing Items; Supercomputer and Semiconductor End Use; Entity List Modification," October 13, 2022,

https://www.federalregister.gov/documents/2022/10/13/2022-21658/implementation-of-additional-export-controls-certain-advanced-computing-and-semiconductor; Gregory C. Allen, "Blocking China's Access to AI Chips Matters to U.S. National Security," July 31, 2023,

https://www.csis.org/analysis/blocking-chinas-access-ai-chips-matters-us-national-security; Liza Lin and Dan Strumpf, "China's Top Nuclear-Weapons Lab Used American Computer Chips Decades After Ban," Wall Street Journal, January 29, 2023,

 $<sup>\</sup>underline{https://www.wsj.com/articles/chinas-top-nuclear-weapons-lab-used-american-computer-chips-decades-after-ban-11} \\ \underline{674990320}.$ 

## Verification vs. Monitoring

This submission uses the term "verification" to distinguish it from the idea of activity monitoring. "Monitoring" implies that some third party is able to track how a chip is being used (e.g., specific code or data loaded on the chip) through some process of unilateral surveillance. Such monitoring is likely neither technically feasible nor desirable from a user privacy and chip security standpoint. Building "back doors" into AI hardware is technically possible but would not result in chips that consumers will want to buy, and would introduce serious security vulnerabilities. <sup>19</sup>

"Verification" refers to a process where the user of a chip instead can remotely attest to a third-party verifier what they are doing with a processor (e.g., how much training compute is being used, or whether a particular dataset was used), using a "Trusted Execution Environment" (TEE). Because of the TEE's security properties, the verifier can trust that information received from the TEE has not been spoofed, so long as they have confidence that hardware security features on the chip have not been compromised. Instead of unilateral surveillance, this should be thought of as a collaboration between a verifier and the chip owner. This collaboration also could be made fully privacy-preserving (i.e., not revealing sensitive code or data) using techniques from multi-party and confidential computing.<sup>20</sup> If a chip owner refuses to engage in such a collaboration, restriction mechanisms could allow the verifier (e.g., a regulator or device manufacturer with particular terms of use) to prevent them from continuing to use the chip.

These actions will be appropriate only in certain contexts. Restriction mechanisms are appropriate in the adversarial context of export control enforcement, on the specialized data center AI chips that are targeted by current AI chip export controls. In the future, as chips grow more powerful, it may become necessary to place some export restrictions on consumer-grade GPUs.<sup>21</sup> These chips could then potentially be equipped with some limited mechanisms to deter smuggling and misuse.

In practice, restriction and verification could be used to enable the following policy measures:

- **Operating licenses**: Using hardware-enforced licenses to deny access to unauthorized users, (e.g., for export control enforcement).
- Location verification: Verifying the location of chips, (e.g., to assist with export control enforcement).
- **Usage verification**: Verifying how chips are being used, (e.g., to enforce an international agreement on tracking and reporting compute usage).<sup>22</sup>
- Usage limitations: Limiting certain chip use cases, (e.g., to restrict exported chips from being used to build large AI clusters capable of training frontier models).<sup>23</sup>

<sup>19</sup> Jeff Goldman, "Chip Backdoors: Assessing the Threat." Semiconductor Engineering, August 4, 2022. <a href="https://semienineering.com/chip-backdoors-assessing-the-threat/">https://semienineering.com/chip-backdoors-assessing-the-threat/</a>.

<sup>&</sup>lt;sup>20</sup> For example, an auditor could run tests on model weights without having direct access to the encrypted weights or having obtained proof about which training data was used to produce a set of model weights. See Confidential Computing Consortium, "Confidential Computing"; Dami Choi, Yonadav Shavit, and David Duvenaud, "Tools for Verifying Neural Models' Training Data," arXiv, July 2, 2023, <a href="https://doi.org/10.48550/arXiv.2307.00682">https://doi.org/10.48550/arXiv.2307.00682</a>.

<sup>&</sup>lt;sup>21</sup> In the most recent update to its semiconductor export controls, BIS has added a notification requirement for exports of consumer chips with AI-relevant capabilities. "Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections.", Supplementary information section C.2, 88 Fed. Reg. 73458, October 25, 2023. https://www.federalregister.gov/d/2023-23055/p-204

<sup>&</sup>lt;sup>22</sup> See Megan Lamberth and Paul Scharre, "Arms Control for Artificial Intelligence," *Texas National Security Review* 6, no. 2 (Spring 2023): 95–110, <a href="https://doi.org/10.26153/TSW/46142">https://doi.org/10.26153/TSW/46142</a>; Mauricio Baker, "Nuclear Arms Control Verification and Lessons for AI Treaties," arXiv, April 8, 2023, <a href="https://doi.org/10.48550/arXiv.2304.04123">https://doi.org/10.48550/arXiv.2304.04123</a>; Mittelsteadt, "AI Verification."

<sup>&</sup>lt;sup>23</sup> This specific-use case is highlighted in a recent request for public comment from the Bureau of Industry and Security: "Public Information on Export Controls Imposed on Advanced Computing and Semiconductor"

The rest of this section will discuss each of these in more detail.

#### 2.1 Operating Licenses to Prevent Unauthorized Use

On-chip mechanisms could be used to implement a chip-specific operating license that requires periodic renewal, similar to a software subscription model. Operating licenses could control whether the chip works at all, limit specific features, or specify more complex restrictions. Importantly, on-chip mechanisms could implement a time-based license, where a chip disables itself if it does not receive a renewed license. This approach prevents reliance on the chip needing to receive an active shutdown command, which likely could be blocked by an uncooperative compute operator.

Hardware-based operating licenses already are used in commercial contexts; two U.S. companies, Intel and IBM, run hardware licensing programs under the names Intel On Demand and Capacity on Demand respectively.<sup>24</sup> In these cases, operating licenses are used to restrict or unlock existing features on chips, depending on whether a customer has paid for them.

This capability would be particularly useful for export control enforcement—for example, if a chip were sold to an entity that subsequently was found to have previously unknown ties to the People's Liberation Army.<sup>25</sup> In practice, this might take the form of a Bureau of Industry and Security statement that export licenses will be granted for controlled chips if the chips have a security module that could be used to disable the chips remotely if there is ever a reason to believe the chips have been utilized by end users and/or for end uses that constitute a breach of the export license. This could include:

- 1. Cases where there is a reason to believe that chips have been, or are at risk of being, re-exported or transferred in violation of their original export license.
- 2. Cases where there is reason to believe that remote access to the chips has been given to sanctioned entities, such as those connected to the Chinese military (if controls on AI chips offered as cloud services are implemented).
- Cases where the owner of the chips is not collaborating with authorities to prove that neither of the two violations mentioned above is occurring.

While an operating license mechanism could require some communication between the chip and the manufacturer, the core functionality would not require an open internet connection. The license could be conveyed to and from the chips by whatever means are most appropriate, whether that be an internet connection, or carefully controlled physical media going in and out of an air-gapped data center.

More speculatively, it may be possible to use operating licenses to make consumer GPUs less useful for AI applications, by using a license to unlock some of the most AI-relevant features and capabilities of the GPU. Such mechanisms are not currently needed, but may become useful in the future.

#### **Location Verification**

Combining trusted location verification with operating licenses could allow for rapid and effective export control enforcement. How would this work? Due to the hard limit of the speed of light and the lower bound on latency from existing communications infrastructure, how quickly a device responds can be used to establish an upper bound on the distance between the device and the source of the query. With secure on-chip mechanisms and multiple trusted "landmark" servers, it becomes possible to determine the approximate location of a chip by comparing these upper bounds, as depicted in the diagram below.

Manufacturing Items to the People's Republic of China (PRC)," https://www.bis.doc.gov/index.php/about-bis/newsroom/2082.

<sup>&</sup>lt;sup>24</sup> "Intel On Demand," Intel,

https://www.intel.com/content/www/us/en/products/docs/ondemand/overview.html; "Capacity on Demand -IBM Documentation," IBM, February 8, 2022,

https://www.ibm.com/docs/en/power9?topic=environment-capacity-demand.

<sup>&</sup>lt;sup>25</sup> Reinsch and Benson, "Digitizing Export Controls."



This diagram shows how a trusted server in Paris, France, could verify that a chip is within the blue circle, and thus not in any country to which chip exports are restricted, by verifying that the chip can respond in less than 9 ms (using the upper bound of the speed of light). The smaller red circle shows an approximate range from which chips could attain a 9 ms latency to the server, using ordinary internet infrastructure. This shows how landmark servers placed every few hundred kilometers could be used to establish sufficient coverage to verify that chips have not been re-exported illegally.

Due to the substantial difference between the speed of light and the latency of ordinary internet infrastructure, the chips would need to be within hundreds of kilometers of the landmarks, and less than 100 km in areas that are near the borders of areas where chips are not allowed to operate. This likely would require hundreds of trusted landmarks globally, but these servers would be quite cheap to set up. Queries would take the form of cryptographic challenges issued against the chip's private key, to ensure that the responder is indeed the chip in question.

This kind of location verification mechanism would be particularly valuable for deterring chip smuggling. Of course, the response from a device always could be delayed to reduce the precision of the location estimate. Collaboration on the part of the compute operator could be incentivized by enacting a policy of revoking operating licenses if the measurement is so imprecise that it cannot verify that the chip is not in a country in which it should not be. Alternatively, the chip itself could query a set of trusted servers, and the operating license could specify that the chip should lock down if it cannot establish that it is in an allowed region. This approach also could allow a chip to establish its own location without the landmark servers being able to determine the chip's location, which could be desirable in some cases, to protect user

\_

<sup>&</sup>lt;sup>26</sup> The speed of light is just under 300 km/ms. This means that if a chip/server responds in y ms, it is, at most  $y \times 150$  km away. In the case depicted in the diagram, by operating a trusted landmark server in Paris, France, we can be perfectly confident that any chip responding to a query from that server in less than 9 ms cannot be in Russia (Russia's Kaliningrad enclave is 9\*15 = 1350 km from Paris.) Using ordinary internet infrastructure, which is substantially slower than the speed of light, chips as distant as <u>London</u> and <u>Brussels</u> can achieve round-trip latencies below 9 ms to Paris. This means that, in Western Europe, landmark servers spaced a few hundred kilometers apart would be sufficient to allow chips to verify that they are not in Russia.

<sup>&</sup>lt;sup>27</sup> Using speed-of-light communication via e.g., radio, the red circle could be expanded to be essentially equivalent to the blue circle. In many cases, it also may be possible to place the trusted server in the same datacenter as the AI chips in question, allowing much greater precision.

privacy. This kind of "region-lock" mechanism could potentially also be useful on consumer GPUs in the future, if the smuggling of such chips becomes a serious concern.

## 2.3 Usage Verification

Continued progress in AI may create and exacerbate scenarios that resemble a "security dilemma." For example, if one country were unsure about a rival's intentions or activities related to developing AI-powered military capabilities, it may be rational for that country to develop or accelerate the development of its own capabilities. Uncertainty about the specific capabilities of rivals, how AI might change the shape of warfare, and exactly how powerful future AI systems might be could all exacerbate this dynamic. This could lead to incentives to prioritize dangerous capabilities research at the expense of safety research, increasing the chance of accidents that could cause harm to all actors. Recent trends in military adoption of AI technology suggest these dynamics are at risk of emerging between Washington and Beijing. The safety research is a suggest that could cause harm to all actors.

As with most security dilemmas, a promising move is to reduce mutual uncertainty about how and whether potentially dangerous systems are being developed by any actor. Just as monitoring and verification technologies have been used to support international agreements and mutual trust in the nuclear domain, on-chip mechanisms could support similar moves in the AI domain.<sup>31</sup> Two key points of difference between these domains are that the number of different actors involved in developing new technologies is likely to be greater in the AI domain, and those actors are much more likely to be commercial actors.

On-chip mechanisms could allow compute users (commercial or otherwise) to make verifiable claims (information that is trustworthy through hardware-level integrity guarantees) about the state of a chip and how it is being used. These features could be extended to the level of an entire cluster and enable compute users to verify key information relevant to AI capabilities and risks, such as the amount of training compute used to train an AI system or other properties of the training process. For example, one recent proposal describes how "hashed" (i.e., privacy-preserving) parts of an AI system could be stored, and later used to prove how much compute was used to train it.<sup>32</sup>

Many of the security features necessary for verifiable claims are already available on high-end server CPUs, as well as NVIDIA's flagship H100 GPU. In recent years, this has been marketed as "confidential computing" and promoted by the Confidential Computing Consortium, of which NVIDIA is a member.<sup>33</sup>

## 2.4 Usage Limitations

On-chip mechanisms could be used to limit the possible uses of chips in various ways. The most relevant to this submission are limiting AI chip usage in large clusters/supercomputers, limiting sensitive data access to support privacy and information security, and limiting chips to only running approved code or models. Each of these applications is discussed below.

https://www.cnas.org/publications/reports/u-s-china-competition-and-military-ai.

https://media.nti.org/pdfs/Assessment of Nuclear Monitoring and Verification Technologies.pdf

<sup>&</sup>lt;sup>28</sup> The term "security dilemma" was introduced by John Herz, "Idealist Internationalism and the Security Dilemma," World Politics vol. 2, no. 2 (1950): 171–201, at p. 157. For an overview of how this concept is relevant to AI, see: Brookings. "Artificial Intelligence and the Security Dilemma,"

https://www.brookings.edu/articles/artificial-intelligence-and-the-security-dilemma/.

<sup>&</sup>lt;sup>29</sup> Center for Security and Emerging Technology. "AI Accidents: An Emerging Threat." https://cset.georgetown.edu/publication/ai-accidents-an-emerging-threat/; Hendrycks, Dan, Mantas Mazeika, and Thomas Woodside. "An Overview of Catastrophic AI Risks." arXiv, October 9, 2023. http://arxiv.org/abs/2306.12001.

<sup>30 &</sup>quot;U.S.-China Competition and Military AI,"

<sup>&</sup>lt;sup>31</sup> For an overview of nuclear monitoring and verification technologies, see "Assessment of Nuclear Monitoring and Verification Technologies," Department of Defense (Defense Science Board), January, 2014,

<sup>32</sup> Shavit, "What Does It Take to Catch a Chinchilla?"

<sup>33 &</sup>quot;Members," Confidential Computing Consortium," <a href="https://confidentialcomputing.io/about/members/">https://confidentialcomputing.io/about/members/</a>.

## Limiting AI chip usage in large clusters/supercomputers

In the October 2023 revisions to AI chip export controls, BIS requested public proposals for "technical solutions that limit [AI chips] from being used in conjunction with large numbers of other such items in ways that enable training large dual-use AI foundation models with capabilities of concern."<sup>34</sup> If this kind of usage were prevented, chips could be safely exported for end uses that only require a smaller number of chips.

As part of the request, BIS mentions an example mechanism, where the various chips that make up a single system, such as a server or a "pod" of servers, are limited to only operating with the original set of chips, and the whole system is limited to only communicate at less than 1 GB/s with the outside world. This kind of restriction could be based on "roots of trust" in each of the chips in the system, that allow all of the chips to attest to each other's identity. Chips would then refuse to work with any chip they do not recognize, which would prevent the end user from introducing additional network connections that would allow the system to be integrated as part of a larger cluster.

A mechanism like this would require a very high degree of interoperability between all of the chips in the system, including, for example, the CPU and the network interface controller. Existing chips could not do this, but fortunately, the data center industry already is working to develop standards and protocols to allow heterogeneous devices found in data centers to attest their identity and integrity to each other using such mechanisms.<sup>35</sup> However, this level of interoperability could be at least 2 years away, based on an interview with an industry expert.

Sophisticated on-chip attestation mechanisms should be complemented by lower-tech physical protections to make it more difficult to modify the system without damaging it, and leaving evidence of modification. This could involve techniques such as "potting": covering the circuit board in difficult-to-remove material. See Appendix B for a dedicated discussion of anti-tampering technologies.

In the future, this kind of attestation mechanism could be extended to implement more flexible restrictions on the use and configuration of computing clusters. In addition to identifying each other, AI chips could share relevant information to detect if they are part of a very large computing workload (e.g., large-scale AI training). For example, each AI chip in a server could track how much data is moving in and out of itself. This information then could be used to estimate the total amount of data being moved to and from the whole server, and therefore detect whether the chips are being used within a large, tightly connected cluster of multiple servers. However, this kind of system could potentially be broken by compromising a small number of the least secure devices involved, making it relatively fragile.

#### Limiting sensitive data access

On-chip mechanisms could support information security and privacy practices. For example, when an AI system is deployed, on-chip mechanisms could be used to ensure a user's data is processed without either the AI developer or the user being able to access the other party's intellectual property (data or model weights). Beyond their commercial utility, such features may become increasingly important as AI systems develop further capabilities in domains with high potential for misuse, such as biology.<sup>36</sup>

#### Limiting AI chips to only running approved code or models

\_

<sup>&</sup>lt;sup>34</sup> "Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections.", Supplementary information section D.2, 88 Fed. Reg. 73458, October 25, 2023. <a href="https://www.federalregister.gov/d/2023-23055/p-350">https://www.federalregister.gov/d/2023-23055/p-350</a>

<sup>&</sup>lt;sup>35</sup> See, for example, the Caliptra root of trust being developed by the CHIPS Alliance: CHIPS Alliance. "Caliptra: A Datacenter System on a Chip (SoC) Root of Trust (RoT)," GitHub, spec.caliptra.io.

Other related efforts are discussed in this recent NIST report: Michael Bartock, Murugiah Souppaya, Ryan Savino, Tim Knoll, Uttam Shetty, Mourad Cherfaoui, Raghu Yeluri, et al. "Hardware-Enabled Security: Enabling a Layered Approach to Platform Security for Cloud and Edge Computing Use Cases." Gaithersburg, MD: National Institute of Standards and Technology (U.S.), May 4, 2022. <a href="https://doi.org/10.6028/NIST.IR.8320">https://doi.org/10.6028/NIST.IR.8320</a>.

<sup>&</sup>lt;sup>36</sup> Jonas B. Sandbrink, "Artificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools," arXiv, August 12, 2023, <a href="http://arxiv.org/abs/2306.13952">http://arxiv.org/abs/2306.13952</a>.

On-chip mechanisms could be used to ensure that only approved code and/or AI models can be run on the processor. This could allow a subset of chips intended for specific uses (e.g., those for use in self-driving cars), to be configured to only run specific, trusted models. This could allow some kinds of misuse to be prevented without much active oversight of the chips.

## 3 Technical Underpinnings

This section begins by explaining the basic operating principles of the core hardware security features—secure boot and remote attestation—that most restriction and verification mechanisms are based on. It then explains how these features could be applied in security modules and trusted execution environments to enable on-chip governance. Each of these key components is shown in the diagram below.



#### Cryptographic signatures

On-chip mechanisms rely heavily on cryptographic signatures (also known as digital signatures), a way of verifying the authenticity of a file or message using public key cryptography.

**Public key cryptography** is a system that uses two different mathematically related codes, called keys, to encrypt and decrypt data. One key is public, while the other key is private and must be kept protected.

A **cryptographic signature** is a sequence of bits that can be used to verify the authenticity of a file or message. It is created using the file and a private code (referred to as a "private key"). Recipients of the file then can use a corresponding "public key" to verify that the signature is valid and that the file comes from the owner of the private key and has not been modified in transit.

#### 3.1 Secure Boot

"Secure boot" is a hardware feature that aims to prevent unauthorized firmware, operating systems, or other software from running on a device.<sup>37</sup> When a chip is turned on (booted), the part of the chip that is responsible for loading the initial firmware code onto the chip checks whether the code has been cryptographically signed by the chip's manufacturer, and refuses to boot if not. This ensures that the chip will run only manufacturer-approved firmware. This typically works as follows:

- 1. The manufacturer generates a pair of keys: a public key and a private key
- 2. The manufacturer stores the public key in read-only memory on the device
- 3. The manufacturer signs the firmware with the private key, creating a signature for it
- 4. The manufacturer sends the firmware and the signature to the device. The device uses the public key to verify that the signature matches the firmware.

Secure boot does not require the device to have any secret information, such as a private key. It only needs to protect the public key from being overwritten. Remote attestation, on the other hand, requires that chips be able to sign outputs so that they can be verified to have come from that chip. This means that the chip itself needs to hold its own private key and prevent anyone from reading it; otherwise, whoever reads the key can forge attestations. Remote attestation is discussed in more detail next.

## 3.2 Remote Attestation

The same functionality that is used to check the integrity of the configuration and firmware of a chip as part of secure boot can be extended to allow the hardware to securely remotely attest to (i.e., make claims about - the state of the system.<sup>38</sup> This is known as "remote attestation." In a remote attestation procedure, the chip generates a signature for the currently loaded firmware (and other measurements about the chip's state) using its own private key, and sends that signature to a verifier (e.g., the manufacturer). The verifier can then use the signature to ensure that the chip is running approved firmware or has a valid "operating license" (discussed below). This overall process is depicted in the diagram below. Remote attestation capabilities make it possible for a remote party to have some degree of control over how a chip is being used, particularly in combination with "trusted execution environments" (discussed below). Such features could be especially useful in the export control context, where an exporter could retain the ability to remotely restrict access to a chip if an export control violation has been detected via remote attestation.

<sup>&</sup>lt;sup>37</sup> "Secure Boot," Microsoft, February 8, 2023,

https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-secure-boot; OCP Security workgroup, "Hardware Secure Boot," Open Compute Project, 2021, 7,

https://www.opencompute.org/documents/secure-boot-2-pdf. This submission is specifically interested in enforced secure boot. Often, secure boot without enforcement is provided as an option that the hardware user can enable or disable, particularly on PCs. This can provide valuable protection against malware but obviously does not restrict the user's behavior.

<sup>&</sup>lt;sup>38</sup> OCP security workgroup, "Attestation of System Components v1.0 Requirements and Recommendations," November 4, 2020, <a href="https://www.opencompute.org/documents/attestation-v1-0-20201104-pdf">https://www.opencompute.org/documents/attestation-v1-0-20201104-pdf</a>; Henk Birkholz et al., "Remote ATtestation procedureS (RATS) Architecture," Request for Comments (RFC Editor, January 2023), <a href="https://www.rfc-editor.org/info/rfc9334">https://www.rfc-editor.org/info/rfc9334</a>.



## 3.3 Security Modules

To implement techniques such as secure boot, many chips today have dedicated security modules, including a dedicated processor, that are responsible for handling private keys and performing other security-related functions. For the purposes of on-chip governance mechanisms such as operating licenses, a security module would need to perform responsibilities such as:

- Secure boot, including measuring, enforcing, and attesting to firmware integrity
- Enabling secure remote firmware updates
- Handling private keys and cryptographic operations to support verifiable claims
- General oversight of the behavior of the chip
- Attesting to device identity.

To implement an operating license (see Section 3), a security module would need to have the ability to limit or disable a chip's operations if the chip does not receive a renewed license within a particular time window. The format of the license could be a short piece of text, cryptographically signed by the compute vendor. The text should include the identifier of the chip in question and information about the ways in which it is authorized to operate, and for how long. The firmware running on the security module would interpret and enforce this license. To support this functionality, the security module would need to have access to an immutable ID corresponding to the chip it is responsible for.

With a timed license expiry period (e.g., weekly or monthly), chip vendors could disable chips without any active intervention being required. The authors expect that a timed license is the only way to implement a robust mechanism for remotely disabling chips: if the mechanism relied on a shutdown command being actively delivered to the chip, the command almost always could be blocked from reaching the chip by the compute operator.

Another technical requirement for properly implementing a hardware operating license is a secure timer. Accurate, hack-proof, and tamper-proof tracking of time generally is considered very difficult.<sup>39</sup> The main reason for this is that, currently, it is within the capabilities of many actors to compromise timers by

-

<sup>&</sup>lt;sup>39</sup> Ross Anderson, Security Engineering: A Guide to Building Dependable Distributed Systems, 3rd ed. (Hoboken, NJ: John Wiley & Sons, 2020), 250–51.

manipulating the power supply to the chip, and thus manipulating the execution speed of instructions.<sup>40</sup> However, for the purposes of controlling access to or usage of AI compute, the primary concern is with the amount of computation done since authorization was received, rather than the exact amount of time.<sup>41</sup> This can be tracked much more robustly by simply counting clock cycles.<sup>42</sup>

It also could be possible to achieve a usable approximation of time if the relevant parts of the chip were continuously powered. This could be achieved with an added battery that could continue to power the relevant part of the chip even when the rest of the system is powered off.<sup>43</sup> It also might be possible to require the surrounding system to provide continuous low levels of power to the chip by designing the timer to "max out" if power is lost, thus requiring re-authorization in the event of a loss of power.

#### 3.4 Trusted Execution Environments

Trusted Execution Environments (TEEs) are isolated environments created within a processor that protect the code and data running inside them from being accessed or modified by other parts of the system. The key difference between security modules and TEEs is that TEEs create a protected environment on the main processor cores, whereas a security module is a separate lower-performance processor specialized for security-related tasks. While security modules can be sufficient for protecting highly sensitive information, TEEs provide an additional layer of protection around the primary computational work performed by the chip.

TEEs are typically used to protect data inside the environment from spying or interference by other parts of the system, such as malware, other users, or the platform software provided by a cloud provider. In the case of on-chip governance mechanisms, TEEs can be used to enable a chip to remotely attest to the state of the TEE and the code running inside the TEE, with these claims being verifiable by third parties.

This can enable certain types of privacy-preserving collaboration using a technique known as multi-party computing. For example, one party could set up a TEE on a chip and attest to another party about the specific code that is loaded in the TEE. The other party could then send encrypted data to the TEE, which is processed by the code, and the results shared, without the original party ever having access to the unencrypted data. <sup>44</sup> This approach conceivably could be used by a third-party evaluator to run tests on an AI model without ever having direct access to the unencrypted weights.

TEEs also might be useful for implementing privacy-preserving logging of information during training. This would allow for retrospective inspections of the training process. A recent paper proposes a protocol for verifying adherence to rules related to AI training—for example, the amount of compute, data, or

\_

<sup>&</sup>lt;sup>40</sup> Wei Huang et al., "Aion Attacks: Manipulating Software Timers in Trusted Execution Environment," Lecture Notes in Computer Science (Detection of Intrusions and Malware, and Vulnerability Assessment, Springer, 2021), 173–93, <a href="https://doi.org/10.1007/978-3-030-80825-9\_9">https://doi.org/10.1007/978-3-030-80825-9\_9</a>.

<sup>&</sup>lt;sup>41</sup> However, if one only tracks computations, rather than time, in theory it would be possible for an actor intending to circumvent restrictions to collect a number of powered off, authorized chips. These chips then could be used for whatever number of operations they are authorized for, even long after the overseer has stopped authorizing that actor's chips. E.g., if an actor's chips were authorized every 24 hours, and their authorization was revoked at the start of a 7-day training run, the training run could still be completed if they had 7 times more chips in reserve than were being used for the training run. However, due to how expensive it would be to keep such reserve compute, this is unlikely to be a major problem. This problem also could be addressed simply by shortening the license reauthorization period.

<sup>&</sup>lt;sup>42</sup> It might be possible to break this using some kind of fault injection attack to prevent the clock cycle counter from incrementing. However, it would likely be extremely difficult and costly to repeatedly carry out such an attack without interfering with the normal functions of the chip, on dozens of chips in operation. See "AON Timer Technical Specification," Open Titan Documentation, February 23, 2023,

https://docs.opentitan.org/hw/ip/aon\_timer/doc/index.html for an example of this approach. The attacker may be able to slow this down, but they would also be slowing down the useful computations done by the chip by the same amount.

<sup>&</sup>lt;sup>43</sup> See "Top Earlgrey," <a href="https://opentitan.org/book/hw/top\_earlgrey/index.html">https://opentitan.org/book/hw/top\_earlgrey/index.html</a> for an example of this approach.

<sup>&</sup>lt;sup>44</sup> Confidential Computing Consortium, "Confidential Computing," 10.

training process used.<sup>45</sup> In this proposal, weights on a chip would be hashed and signed at random times during training, and these hashes would be logged.<sup>46</sup> The logged hashes could be used later to prove which chips were used to train a given model, and to verify the provenance of a model through the provision and replication<sup>47</sup> of training transcripts from the organization that did the training.<sup>48</sup>

# 3.5 When Should a Security Module vs. a Trusted Execution Environment Be Used?

Security modules use separate dedicated processors for handling security-critical operations like cryptography and enforcing policies. TEEs are isolated environments created within the main processor(s) of a chip to protect code and data from being accessed by other software on the system.

A security module could be much simpler than its associated AI chip, and thus much more secure. If the interface between the security module and the user-accessible parts of the system can be kept very simple, it is much more feasible to ensure it does not have major vulnerabilities that could be exploited to gain access to the security module from the main processor.

Trusted execution environments, on the other hand, run on the main processor(s) themselves. This complexity has often led to TEEs being vulnerable to side-channel attacks that exploit shared resources like caches. <sup>49</sup> A separate security module reduces this risk given that user code is not allowed to run on it. However, TEEs are necessary to enable remote attestation of code (and data) running on chips. As such, this submission suggests that security modules should be used in on-chip governance mechanisms where possible, such as for requiring a valid operating license, and TEEs should be used otherwise only where necessary, such as for enabling verifiable claims about training compute usage.

## 4 Challenges for Implementation

Many of the required features for on-chip governance mechanisms already are present on commercial devices. Apple's iPhone is one of the most well-realized implementations. The secure boot functionality of an iPhone aims to ensure that only legitimate firmware and legitimate versions of the iOS operating system can be booted. Because only legitimate versions of iOS can be booted, Apple can tightly control the apps that can be run. This functionality is enabled in part by the Apple Secure Enclave Processor, a security module also found on other Apple devices such as MacBooks.<sup>50</sup>

<sup>46</sup> Hashing refers to transforming data into a short alphanumeric sequence of a standardized length. Hashes are generated using algorithms such that the same data always will produce the same hash, but without the hash revealing the original data. This allows the owner of hashed data to prove that their data generated that hash, without revealing the original data itself.

<sup>&</sup>lt;sup>45</sup> Shavit, "What Does It Take to Catch a Chinchilla?"

<sup>&</sup>lt;sup>47</sup> As part of verifying a model's provenance, the proposed scheme involves retraining parts of the model using the provided transcripts on trusted third-party hardware, to check that the resulting weights match the originally logged hashes. This retraining step verifies that the transcripts accurately reflect the original training process.

<sup>&</sup>lt;sup>48</sup> In most cases the compute operator could be trusted to store the logs, because logging would either be a voluntary action taken by the operator or required by some type of regulatory regime with the capacity to take enforcement action against anyone found to not have kept the required logs. However, it would be a useful additional security feature for the chip to have secure non-volatile storage in which to keep (cryptographic hashes of) some recent logs. This would allow an inspector to detect that something is wrong if the operator has succeeded in stealing the private key on the chip (e.g., through side channels) and forged logs but has not succeeded in otherwise tampering with the chip. Inspecting these logs in person would be costly but likely worth it in high stakes situations.

<sup>&</sup>lt;sup>49</sup> Stephan van Schaik et al., "SoK: SGX.Fail: How Stuff Get eXposed," 2022, <a href="https://sgx.fail">https://sgx.fail</a>; Huang et al., "Aion Attacks."

<sup>&</sup>lt;sup>50</sup> Apple, "Apple Platform Security," May 2022, 9–17, https://help.apple.com/pdf/security/en\_US/apple-platform-security-guide.pdf. Apple, "Apple Platform Security," May 2022, https://help.apple.com/pdf/security/en\_US/apple-platform-security-guide.pdf.

Many of these features also are present on the world's leading AI GPU, the NVIDIA H100. It and most other NVIDIA GPUs include a dedicated security module.<sup>51</sup> The H100 also includes a TEE known as "NVIDIA Confidential Computing."<sup>52</sup> The H100 is relatively uncommon among GPUs for having a TEE, but TEEs are relatively common on CPUs, as are dedicated security modules.<sup>53</sup> Despite its advanced features, the H100 still may not support all of the mechanisms required for an ideal implementation of the governance measures described in this submission, even with appropriate firmware updates. However, this example, together with the commercial hardware licensing schemes already implemented by Intel and IBM, shows that the features discussed thus far are likely feasible and economical to implement on AI chips.<sup>54</sup> Given that NVIDIA chips are by far the most capable and popular for training cutting-edge models, it would be valuable to build on or refine their existing security features into an initial implementation of on-chip governance mechanisms.<sup>55</sup>

More challenging will be ensuring the integrity of these mechanisms in the face of efforts by determined and well-resourced adversaries. In real-world applications, the security features and mechanisms described in the previous section would be exposed to adversarial parties attempting to compromise them in various ways. The risks of these mechanisms being misused by third-party hackers or for unlawful surveillance also must be considered.

This section analyzes these challenges in detail. It first describes the privacy and cybersecurity implications of on-chip governance mechanisms and offers thoughts on how mechanisms should be designed to avoid these issues. It then turns to the principal challenge for implementation: making on-chip governance mechanisms sufficiently secure to defend against an adversary with physical access to the chip. The section presents three prospective operating contexts and threat models and analyzes how far away current technologies are from being mature enough to deploy in each of these contexts. A more detailed discussion of the nature and feasibility of the required security technologies can be found in Appendix B.

## The track record of similar technology

The Apple Secure Enclave Processor is a security module found on many Apple devices, including iPhones and MacBooks.<sup>56</sup> Its primary purpose is to protect sensitive information such as

paper.pdf, and AMD's Epyc server CPUs are equipped with the AMD Secure Processor ("AMD Secure Encrypted Virtualization (SEV)," AMD, <a href="https://www.amd.com/en/developer/sev.html">https://www.amd.com/en/developer/sev.html</a>).

https://help.apple.com/pdf/security/en US/apple-platform-security-guide.pdf.

<sup>&</sup>lt;sup>51</sup> AleksandarK, "NVIDIA Unlocks GPU System Processor (GSP) for Improved System Performance," TechPowerUp, March 12, 2023,

https://www.techpowerup.com/291088/nvidia-unlocks-gpu-system-processor-gsp-for-improved-system-performan ce; NVIDIA, "NVIDIA Accelerated Linux Graphics Driver README and Installation Guide," February 2023, chap. 43, https://download.nvidia.com/XFree86/Linux-x86\_64/530.30.02/README/gsp.html. The current version is code-named "Peregrine." Marko Mitic, "Systematically Securing the RISC-V - Secure Foundation for Embedded Functionality," https://www.youtube.com/watch?v=17i1kfHvWNI; Mike Heskin, "Ok so, Considering That: A) Nvidia Has Moved Away from Falcon for Good, Replacing It with a RISC-V Based Solution ('Peregrine'); b) Nintendo No Longer Uses the TSEC for Secure Boot on New Switch Units;," Tweet, Twitter, January 29, 2021, https://twitter.com/hexkyz/status/1355168275856982019.

<sup>&</sup>lt;sup>52</sup> "NVIDIA Confidential Computing," NVIDIA, July 25, 2023, <a href="https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/">https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/</a>.

<sup>&</sup>lt;sup>53</sup> For example, Intel's CPUs rely on the Intel Management Engine (Rivka Gehler et al., "Intel® Converged Security and Management Engine (Intel® CSME) Security Technical White Paper," October 2022, <a href="https://www.intel.com/content/dam/www/public/us/en/security-advisory/documents/intel-csme-security-white-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-meaning-to-mea

<sup>&</sup>lt;sup>54</sup> "Capacity on Demand - IBM Documentation"; "Intel On Demand."

The Google entries use Google-designed TPU chips. Implementing on-chip governance mechanisms on TPUs is less urgent because Google only offers access to TPUs via their own cloud services, and thus can implement governance mechanisms at the cloud service layer. Epoch, "Parameter, Compute and Data Trends in Machine Learning," <a href="https://docs.google.com/spreadsheets/d/1AAIebjNsn]j\_uKALHbXNfn3\_YsT6sHXtCU0q7OIPuc4/">https://docs.google.com/spreadsheets/d/1AAIebjNsn]j\_uKALHbXNfn3\_YsT6sHXtCU0q7OIPuc4/</a>. For Apple, "Apple Platform Security," May 2022, 10–17,

cryptographic keys. It also plays a role in Secure Boot.<sup>57</sup> Over the years that various iterations have been in use, the Apple Secure Enclave Processor has proven to be quite secure since it was first deployed in 2013.<sup>58</sup> Only one major publicly known vulnerability has been discovered, in 2020.<sup>59</sup>

This is despite the Processor being subject to substantial amounts of security research<sup>60</sup>—and strong interest in circumventing these safeguards from much of Apple's customer base. Circumventing secure boot on iPhones is popularly known as "jailbreaking" iPhones. While jailbreaks were common in the early- to mid-2010s, publicly known ways to jailbreak the most recent iPhones have become much rarer since the late 2010s as Apple has improved its security. Today jailbreaking is only possible if the phone's operating system hasn't been updated in several years.<sup>61</sup>

Other relevant efforts to secure hardware against attacks have not necessarily achieved this level of success. In 2021, NVIDIA introduced "Lite Hash Rate" (LHR) limitations on some of its GeForce gaming GPUs.<sup>62</sup> The purpose of the LHR feature was to limit the cryptocurrency mining performance ("hash rate") of these GPUs to ensure the availability of gaming GPUs for gamers, with cryptocurrency miners instead purchasing NVIDIA's dedicated line of cryptocurrency mining GPUs.<sup>63</sup> The hash rate limiter appears to have been based on secure boot features verifying that the code controlling the GPU was legitimate.<sup>64</sup> That code then looked for a certain pattern of memory accesses to detect cryptocurrency mining, and then throttled the performance of the GPU.<sup>65</sup> However, methods for partial circumvention were developed in a few months, and full circumvention was achieved a little more than a year after the release of the restricted GPUs.<sup>66</sup> Full circumvention reportedly became possible after a hack of NVIDIA's code base revealed that the code used to detect memory access patterns could be fooled into constantly resetting its internal counter.<sup>67</sup>

 $\underline{\text{https://www.apple.com/newsroom/2013/09/10Apple-Announces-iPhone-5s-The-Most-Forward-Thinking-Smartp-hone-in-the-World/.}$ 

<sup>67</sup> Lolliedieb, How the 100% LHR unlocker works (lolMiner interview).

<sup>&</sup>lt;sup>57</sup> Apple, "Apple Platform Security," May 2022, 29.

<sup>&</sup>lt;sup>58</sup> Apple, "Apple Announces iPhone 5s—The Most Forward-Thinking Smartphone in the World," September 10, 2013.

<sup>&</sup>lt;sup>59</sup> ironPeak, "Crouching T2, Hidden Danger," ironPeak, October 5, 2020, https://ironpeak.be/blog/crouching-t2-hidden-danger/.

<sup>&</sup>lt;sup>60</sup> See, for example, Tarjei Mandt, Mathew Solnik, and David Wang, "Demystifying the Secure Enclave Processor," August 2016, <a href="http://mista.nu/research/sep-paper.pdf">http://mista.nu/research/sep-paper.pdf</a>; Jeremy Erickson and Misha Davidov, "Deciphering the Messages of Apple's T2 Coprocessor," *Duo Security*, February 14, 2019, <a href="https://duo.com/labs/research/apple-t2-xpc">https://duo.com/labs/research/apple-t2-xpc</a>.

<sup>&</sup>lt;sup>61</sup> As of 2023, publicly available jailbreaks only work on iPhones released several years ago, and only if the user has not updated the software for over two years. It is worth noting that not all methods to install unapproved apps require fully jailbreaking a device. For crowdsourced collations of jailbreaks, see "iOS Jailbreaking," Wikipedia, <a href="https://en.wikipedia.org/w/index.php?title=IOS\_jailbreaking#By\_device\_and\_OS;">https://en.wikipedia.org/w/index.php?title=IOS\_jailbreaking#By\_device\_and\_OS;</a> "Can I Jailbreak," <a href="https://canijailbreak.com/">https://canijailbreak.com/</a>.

<sup>&</sup>lt;sup>62</sup> NVIDIA retired the LHR feature in October 2022 by disabling it in new driver versions (Michael Kan, "Nvidia Confirms LHR' Mining Limiter for GPUs Has Been Eliminated," *PCMag*, October 14, 2022, <a href="https://www.pcmag.com/news/nvidia-confirms-lhr-mining-limiter-has-been-eliminated-from-gpus">https://www.pcmag.com/news/nvidia-confirms-lhr-mining-limiter-has-been-eliminated-from-gpus</a>). The feature became obsolete after Ethereum moved to proof-of-stake and demand for GPUs for mining purposes fell.

<sup>63</sup> Matt Wuebbling, "GeForce Is Made for Gaming, CMP Is Made to Mine," NVIDIA Blog, February 18, 2021, <a href="https://blogs.nvidia.com/blog/2021/02/18/geforce-cmp/">https://blogs.nvidia.com/blog/2021/02/18/geforce-cmp/</a>.

<sup>&</sup>lt;sup>64</sup> This claim is based on the following statement from NVIDIA: "End users cannot remove the hash limiter from the driver. There is a secure handshake between the driver, the RTX 3060 silicon, and the BIOS (firmware) that prevents removal of the hash rate limiter." Jacob Ridley, "Nvidia Says Its Cryptocurrency Mining Limiter 'Cannot Be Hacked," *PC Gamer*, February 19, 2021,

 $<sup>\</sup>underline{\text{https://www.pcgamer.com/nvidia-ethereum-mining-limiter-cannot-be-hacked/}}.$ 

<sup>65</sup> More precisely, it seems only the offending process was throttled. Lolliedieb, How the 100% LHR unlocker works (lolMiner interview), interview by Seb Hezlo, May 12, 2022, <a href="https://www.youtube.com/watch?v=LgAr4Erm\_4o.">https://www.youtube.com/watch?v=LgAr4Erm\_4o.</a>
66 Michael Crider, "Nvidia's Crypto-Crippling 'Lite Hash Rate' GPU Tech Has Been Defeated," PCWorld, May 9, 2022, <a href="https://www.pcworld.com/article/698962/nvidia-rtx-cards-fully-unlocked-for-crypto-miners.html">https://www.pcworld.com/article/698962/nvidia-rtx-cards-fully-unlocked-for-crypto-miners.html</a>.

## 4.1 Privacy, Surveillance, and Cybersecurity Implications

One of the most immediate concerns for on-chip governance mechanisms is their potential to be misused, either by the owner of the mechanism to conduct unlawful surveillance or by third party hackers taking advantage of insecure "back doors."

First, on-chip governance mechanisms should be designed to minimize the danger of such misuse. In particular, mechanisms for remotely disabling chips should be designed to respond to the absence of authorization, rather than an active shut-down signal. This means that if someone stole the keys to this system, the only misuse that would be possible would be to stop the chips from being disabled. This, of course, would be very damaging for the intended goal of the mechanisms but would not enable directly harmful misuse, such as an unexpected shutdown signal during a period of crucial operation. The previous section emphasizes robust secure boot functionality in part because it increases the security of devices, by making it more difficult for malware to compromise low levels of the software stack, rather than making any type of attack or misuse more feasible.

Relatedly, verification systems could and should be designed such that the compute operator is responsible for communicating the verified claims to the verifier. There is no need for verifiers to be able to read information from the system unilaterally, and if the verifier does not have that capability, no third parties can exploit the capability. Instead of unilateral surveillance, this should be thought of as a collaboration between a verifier and the chip owner. This collaboration also could be made fully privacy-preserving (i.e., not revealing sensitive code or data) using techniques from multi-party and confidential computing. If a chip owner refuses to engage in such a collaboration, restriction mechanisms could allow the verifier (e.g., a regulator or device manufacturer with particular terms of use or enforcing the terms of an export license) to prevent them from continuing to use the chip.

There have been some concerns that security modules similar to the type proposed here can provide "back doors" to computers. Traditionally, security modules and system processors have had an extreme level of trust and privileged access, such that if an attacker can compromise such a component, they can bypass other forms of security. However, practically all CPUs and GPUs have system processors that are at least as concerning from this perspective as a security module would be, given the inherent advantages in the security module's security due to its simplicity. Security modules also should be designed to have limited access to the rest of the chip, such that compromising the module would not allow sensitive data to be exfiltrated.

<sup>---</sup>

<sup>&</sup>lt;sup>68</sup> For examples of some relevant questions to consider in the context of state control, see Richard Danzig, "Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority," Center for a New American Security, 2018, app. 1.

<sup>&</sup>lt;sup>69</sup> For example, an auditor could run tests on model weights without having direct access to the encrypted weights, or obtain proof about which training data was used to produce a set of model weights. See Confidential Computing Consortium, "Confidential Computing"; Choi, Shavit, and Duvenaud, "Tools for Verifying Neural Models' Training Data."

<sup>&</sup>lt;sup>70</sup> Ms Smith, "Now You, Too, Can Disable Intel ME 'backdoor' Thanks to the NSA," CSO Online, August 29, 2017, <a href="https://www.csoonline.com/article/3220476/researchers-say-now-you-too-can-disable-intel-me-backdoor-thanks-to-the-nsa.html">https://www.csoonline.com/article/3220476/researchers-say-now-you-too-can-disable-intel-me-backdoor-thanks-to-the-nsa.html</a>.

<sup>&</sup>lt;sup>71</sup> For example, a vulnerability in the Apple T2 security chip (an earlier iteration of the Secure Enclave Processor) allowed attackers with physical access to gain privileged access to the device (ironPeak, "Crouching T2, Hidden Danger"). The reference implementation of the widely used Trusted Platform Module 2.0 standard for security modules also recently was found to have (patchable, likely unexploitable) firmware vulnerabilities (Francisco Falcon, "Vulnerabilities in the TPM 2.0 Reference Implementation Code," Quarkslab's blog, March 14, 2023, <a href="https://blog.quarkslab.com/vulnerabilities-in-the-tpm-20-reference-implementation-code.html">https://blog.quarkslab.com/vulnerabilities-in-the-tpm-20-reference-implementation-code.html</a>). Many vulnerabilities also have been discovered in the Intel Management Engine ("Search Results - 'Intel Management Engine," CVE, <a href="https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=Intel+Management+Engine">https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=Intel+Management+Engine</a>).

<sup>&</sup>lt;sup>72</sup> Note that restricted access to information on the rest of the chip would trade off against the module's ability to attest to that information.

On-chip governance mechanisms should not be used to share any kind of personal data. The verification-based approach proposed in this submission allows the compute owner to choose what kind of information is shared and removes the ability of a verifier or controller to directly acquire sensitive data. These mechanisms will be appropriate only for chips used in particular contexts, such as where export control violations are likely, or to support domestic regulation lawfully governing the usage of AI chips. This kind of limited application appears well-supported by current norms and laws: a report from the Center for Strategic and International Studies analyzes the privacy implications of collecting or requiring the collection of commercial data in an export controls context and finds that to date, foreign countries' domestic digital privacy frameworks explicitly focus on personal data while leaving commercial data more open.<sup>73</sup>

## 4.2 Overview of Threat Models and Defenses

The threat models considered here assume that the attacker has physical access to the AI hardware.<sup>74</sup> Different types of attackers will have different levels of willingness to spend resources to circumvent a mechanism, and different degrees of "covertness"—the desire to avoid being discovered to have attempted to circumvent a mechanism.<sup>75</sup> Based on these considerations, this submission loosely groups attackers into three threat models of increasing difficulty:

- 1. **Minimally adversarial contexts**, where attackers do not spend much on attacks, and are very averse to being discovered attempting to compromise mechanisms
- 2. **Covertly adversarial contexts**, where attackers are more willing to spend substantial resources to compromise mechanisms, but still want to avoid being caught doing so
- 3. **Openly adversarial contexts**, where attackers are willing to spend very significant resources to compromise mechanisms and are indifferent to this being discovered.

Each of these categories requires a distinct approach to defense. The table below summarizes these different approaches. In all three contexts, physical, firmware, and software security are important. A detailed discussion of the nature and feasibility of the required security features in each of these areas can be found in <u>Appendix B</u>.

<sup>&</sup>lt;sup>73</sup> Reinsch and Benson, "Digitizing Export Controls."

<sup>&</sup>lt;sup>74</sup> The assumption of physical access is made because on-chip governance mechanisms are useful primarily in cases where an untrusted actor will or may have physical access to the device. In other contexts, such as when a trusted cloud provider wants to enforce restrictions on their customers, the restrictions can be imposed at the software level. Hardware-level implementations of restrictions still can be somewhat useful due to being particularly difficult to circumvent, but they are not qualitatively superior to software-level implementations.

<sup>&</sup>lt;sup>75</sup> This submission borrows the terminology of covert adversaries from Yonatan Aumann and Yehuda Lindell, "Security Against Covert Adversaries: Efficient Protocols for Realistic Adversaries," in *Theory of Cryptography*, vol. 4392, Lecture Notes in Computer Science (Springer, 2007), 137–56, <a href="https://doi.org/10.1007/978-3-540-70936-7">https://doi.org/10.1007/978-3-540-70936-7</a> 8.

#### Overview of threat models and required protections

| Threat<br>model          | Key<br>attacker<br>properties               | Protections required                           | Example applications                                                                                        | Feasibility                                                                                                         | Time to implement minimal solution                                                                                                        | Time to implement ideal solution                                                                                                                                                          |
|--------------------------|---------------------------------------------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Minimally<br>adversarial | Low<br>resources,<br>highly<br>covert       | Basic security<br>measures                     | Domestic regulation,<br>export control<br>enforcement on<br>cloud services                                  | <b>High:</b> Current level of security likely sufficient.                                                           | Months: Some mechanisms could be implemented as changes to firmware and chip configuration.                                               | 2–5 years: There are likely software- and hardware-level vulnerabilities in current security features.                                                                                    |
| Covertly<br>adversarial  | Moderate to<br>high<br>resources,<br>covert | Exceptionally secure software, tamper-evidence | Export control<br>enforcement against<br>large companies,<br>treaty verification                            | Moderate:<br>Significant<br>additional<br>investment in<br>software security<br>and<br>tamper-evidence<br>required. | Months: Firmware changes and ad hoc tamper-evidence likely could be implemented in months, and may be sufficient in some cases.           | 2–5 years: There are likely hardware-level vulnerabilities in current hardware security features. Improved tamper-evident features also could take years to reach large-scale production. |
| Openly<br>adversarial    | High resources, non-covert                  | Provably secure software, tamper-proofing      | More challenging cases of export control enforcement and treaty verification, where other deterrence fails. | Uncertain:<br>Significant<br>investments in<br>software and<br>hardware security<br>may be sufficient.              | 2–3 years: Hardware-level vulnerabilities would need to be resolved, and rudimentary tamper-proofing measures would need to be developed. | 4–8 years: Truly robust tamper-proof packaging could take years to develop and test, due to the need for slow physical production and testing processes.                                  |

## Thinking in terms of cost imposition

When considering whether a given set of defenses would be sufficient, it is important to consider that the most dangerous forms of export control circumvention likely would require an attacker to overcome mechanisms on *large numbers* of chips (many thousands), either to train powerful AI models or to deploy them at scale.

This is both advantageous and disadvantageous for the defender. On the one hand, evidence of tampering with large numbers of chips would be easier to discover, and labor-intensive tampering would be very expensive. On the other hand, the need to tamper with large numbers of chips means that up-front costs of developing an attack can be spread across many chips, which can make some types of attacks look cheap relative to their payoff. For example, if, at some point in the future, a foreign military illegally acquires \$500 million worth of AI chips (around 10,000 leading-edge AI chips at today's prices), it could be worth it for them to spend another \$500 million to develop a way of defeating the remote disabling mechanism on the chips. On the other hand, costs that need to be paid for each chip will become very large. This is important for physical attacks that would require the use of very sophisticated equipment and skilled labor.

It thus becomes important to design security measures that impose high per-chip costs on attackers. It also is important that any single points of failure that would allow scalable attacks, such as firmware vulnerabilities, need to be designed to withstand very well-resourced attackers.

## 4.2.1 Minimally Adversarial Contexts

In these contexts, would-be attackers do not spend much on attacks and are very averse to being discovered attempting to compromise mechanisms. Would-be attackers of this type would be, for example, technology companies based in the United States or friendly nations that are subject to regulations related to training computation usage or other development practices. Such companies would be very likely to comply with inspections and have limited motivation to circumvent the restrictions. The level of security already present on existing hardware security features and software and firmware likely would be sufficient for such actors.

As an illustrative case study, NVIDIA's software license agreement currently bans the use of its gaming GPUs in data centers. The Even though gaming GPUs can be viable sometimes as more affordable alternatives to data center chips, and NVIDIA has limited ability to directly enforce this license agreement, no major U.S. cloud provider offers cloud AI computing services based on gaming GPUs.

Some other examples of minimally adversarial contexts include:

- Compute vendors enforcing license agreements
- Enforcement and monitoring of domestic regulation
- Treaty verification between countries with high mutual trust
- Auditing and agreements between AI companies with high trust in each other.

In characterizing a situation as minimally adversarial, policymakers and counterparties will need to consider how much these actors would have to gain from circumventing a mechanism. In many cases, an actor may not have much to gain. But in some cases, skirting a regulation might allow, for example, a company to gain billions of dollars' worth of market share via developing a better AI system. In such a case, a company may be willing to spend substantial resources circumventing a restriction or monitoring system. One analogous example would be the Volkswagen emissions scandal. In such cases, it is especially important to ensure on-chip governance mechanisms can resist sophisticated attacks, and characterizing them as covertly adversarial may be more appropriate.

#### 4.2.2 Covertly Adversarial Contexts

In these contexts, attackers are more willing to spend substantial resources to compromise mechanisms but still want to avoid being caught. Companies in some countries, such as China, historically have shown less respect for license agreements or intellectual property and may be relatively willing to attempt attacks on on-chip governance mechanisms. However, given threats, for example, of being cut off from the supply of further chips, or broader U.S. sanctions, these companies would face incentives against attempting these attacks openly. Many potential applications of on-chip mechanisms for export control enforcement and international agreements therefore can be characterized as covertly adversarial contexts.

In covertly adversarial contexts, if a high degree of software security has been achieved, the key to defense becomes tamper-evidence: ensuring that any physical tampering would leave physical evidence that could be discovered by inspectors. If inspections (either in person or remote, if the technology exists) are feasible, and violators can and would be effectively punished, tamper-evidence should be sufficient to achieve deterrence. Tamper-evidence appears reatively easily achievable from a technical implementation perspective. See Appendix B for further details.

<sup>&</sup>lt;sup>76</sup> "License for Customer Use of NVIDIA GeForce Software," NVIDIA, https://www.nvidia.com/en-us/drivers/geforce-license/.

<sup>&</sup>lt;sup>77</sup> Katyanna Quach, "Nvidia: Using Cheap GeForce, Titan GPUs in Servers? Haha, Nope!," The Register, January 3, 2018, <a href="https://www.theregister.com/2018/01/03/nvidia-server-gpus/">https://www.theregister.com/2018/01/03/nvidia-server-gpus/</a>; Jordan Novet, "Nvidia Made a Change to How It Lets Developers Use Its Chips, and Some Folks Aren't Happy," CNBC, December 27, 2017, <a href="https://www.cnbc.com/2017/12/27/nvidia-limits-data-center-uses-for-geforce-titan-gpus.html">https://www.cnbc.com/2017/12/27/nvidia-limits-data-center-uses-for-geforce-titan-gpus.html</a>.

<sup>&</sup>lt;sup>78</sup> "Learn About Volkswagen Violations," EPA, September 27, 2022, https://www.epa.gov/vw/learn-about-volkswagen-violations.

#### 4.2.3 Openly Adversarial Contexts

In openly adversarial contexts, tampering efforts cannot be deterred by threats of punishment or penalties. This likely would be the case if export-controlled chips have ended up in the hands of an uncooperative foreign military or other powerful state-linked actors.<sup>79</sup>

International treaty verification and enforcement also could sometimes be appropriate to treat as openly adversarial. For example, if a country with strong incentives to "cheat" the terms of a treaty has been allowed to amass powerful chips under the conditions of that treaty, it would be ideal if the chips were secure enough that the country could not violate their treaty commitments, even if they were willing to openly renege on those commitments.

All of this means that on-chip governance mechanisms operating in such contexts should be tamper-proof. Tamper-proofing refers to defenses that detect tampering efforts and respond by destroying whatever the attacker was attempting to access. Tamper-proofing like this is currently used on some dedicated hardware security modules, but no existing solutions on the market appear to be applicable to AI chips. It seems likely, but not certain, that effective tamper-proofing for AI chips could be developed, but this likely would require investment and time to develop and deploy at scale. See Appendix B for further details.

## 5 Implementation Timelines

If the capabilities and national security risks of broadly capable AI systems continue to grow at the pace seen in 2022 and 2023, the need for highly effective controls will become acute in several years' time. Crucially, developing and deploying the governance mechanisms described in this submission will take time (months in the most optimistic case, years in the most likely case). This suggests that policymakers concerned about this issue should begin formulating policies and preparing appropriate technologies now. Once the relevant security features have been mandated in the most powerful AI chips, they also need not be used immediately: The mechanisms described in the previous section would allow for rapid and flexible responses to new developments and threats once installed.

The 2022 U.S. export controls targeting AI chips are an excellent example of the importance of acting early when governing computing hardware. To simplify, the export of any chip equal to or better than the NVIDIA A100 to China was restricted. At the time of imposition, these controls had likely minimal effect China's AI industry, because thousands of affected AI chips already were present in China, and chips of similar performance to the A100 were still uncontrolled.<sup>80</sup> But if these controls are kept in place for years, the difference between the best chips on the market and the best chips that Chinese AI developers can legally obtain in 2027 will be likely substantial.<sup>81</sup> Another key lesson is these export controls were updated a year later to be more effective and close key loopholes (and will likely continue to be updated). This gives additional reason to begin any similarly technically complex rulemaking process early.

It likely will take 18 months to 4 years to robustly harden the technologies required for on-chip governance mechanisms, and a further 4 years for chips with these mechanisms to become sufficiently widespread for these mechanisms to be broadly effective. However, intermediate stages of technological

-

<sup>&</sup>lt;sup>79</sup> Criminal organizations, although ostensibly openly adversarial, are less likely to pose a significant threat in this category. Only the most well-resourced and sophisticated actors, who often may be state-backed, would have the means and the motivation to engage in large-scale AI training or deployment that requires cutting-edge chips, or to overcome sophisticated on-chip security mechanisms.

<sup>&</sup>lt;sup>80</sup> Since the 2022 export controls, technical thresholds for controlled chips have been updated to include a wider range of AI chips. "Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections." 88 Fed. Reg. 73458, October 25, 2023. <a href="https://www.federalregister.gov/documents/2023/10/25/2023-23055/implementation-of-additional-export-controls-certain-advanced-computing-items-supercomputer-and.">https://www.federalregister.gov/documents/2023/10/25/2023-23055/implementation-of-additional-export-controls-certain-advanced-computing-items-supercomputer-and.</a>

<sup>&</sup>lt;sup>81</sup> This claim is contingent on the bandwidth threshold in the 2022 export controls being sufficient to severely hamper large-scale supercomputing and AI training. Currently, chip performance in TOP/s can be scaled up indefinitely, so long as inter-chip bandwidth remains below a threshold.

development still will be useful in production contexts. In the short term, firmware updates could be deployed to any AI chips with the necessary security features. This would initiate a "testing phase" for on-chip governance mechanisms, where their usage would be limited to minimally adversarial environments and/or environments where in-person inspections are possible.

The impact of the additional lag introduced by "sufficient uptake" could be mitigated by tracking the sale of AI chips before the introduction of on-chip governance mechanisms and restricting their sale to specific actors. For example, the broad ban on the export of high-end AI chips to China and Russia could be kept in place until effective on-chip governance mechanisms have been implemented, at which point licenses could be granted under certain conditions. Recently, the Bureau of Industry and Security suggested that they could make exceptions to export controls for chips equipped with technical mechanisms that would prevent the chips from being used for powerful AI training, and requested proposals for such mechanisms.

#### Implementation stages for on-chip governance

| Stage                 | Required steps and dependencies                                                                                                                                                                                                                                                                                                                            | Expected duration    |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| Policy formulation    | Establish policies that require or incentivize chip firms' implementation of on-chip governance mechanisms. Draft requirements should be communicated to chip companies as early as possible to ensure that technical work can commence.                                                                                                                   | ~1 year              |
| Technical development | Develop secure versions of on-chip governance mechanisms based on hardened security modules and other defenses (See Appendix B)  Can begin once requirements from the previous stage are sufficiently clear.                                                                                                                                               | 18 months to 4 years |
| Sufficient uptake     | To ensure that all or most cutting-edge AI development can be governed by on-chip mechanisms, these chips first will have to see uptake by the large commercial entities developing the most powerful AI systems. As a rule of thumb, it is assumed that chips that are four years old (approximately two GPU generations) are no longer cost-competitive. | 4 years              |

## 5.1 Timelines for Technical Development of Security Features

This submission defines the goal of technical development as a hardened security module included on all high-performance data center AI chips that can ensure that the chip has valid, up-to-date firmware and software and, where applicable, an up-to-date license. The security module would block the chip from

-

<sup>82</sup> This submission is focused on governing cutting-edge chips. However, chips lagging behind cutting edge may still have significant misuse potential in the future, as they could be used to run inference on near-frontier models, or train smaller—but still dangerous—models. Indeed, the misuse potential of a given AI chip should be expected to grow over time as algorithmic efficiency improves (Danny Hernandez and Tom B. Brown, "Measuring the Algorithmic Efficiency of Neural Networks," arXiv, May 8, 2020, https://doi.org/10.48550/arXiv.2005.04305; Ege Erdil and Tamay Besiroglu, "Algorithmic Progress in Computer Vision" (arXiv, August 24, 2023), https://doi.org/10.48550/arXiv.2212.05153), and more powerful models become widely available. Therefore, to continue to prevent misuse effectively, it may be desirable to lower, rather than raise, the performance thresholds used to determine what kind of regulations a given chip is subject to.

<sup>&</sup>lt;sup>83</sup> "Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections," Supplementary information section D.2, *88 Fed. Reg. 73458*, October 25, 2023, <a href="https://www.federalregister.gov/d/2023-23055/p-350">https://www.federalregister.gov/d/2023-23055/p-350</a>.

operating if these conditions were not met. This valid, up-to-date firmware and software then could help enforce limits on the uses of these chips and offer sophisticated remote attestation capabilities. The security module could ensure that if vulnerabilities are found in this firmware and software, users would have no choice but to update to patched versions where the vulnerability has been fixed. Technical R&D to support such an implementation would involve:

- Implementations of security modules and trusted execution environments applicable to cutting-edge AI chips, including license requirements and remote attestation
- Development of tamper-evident and tamper-proof technologies specific to high-performance data center chips
- Potential additional features, such as communication between chips to ascertain and report use in large clusters, latency-based geolocation, or logging in secure non-volatile memory
- Red-teaming, verifying, or otherwise enhancing the security of the above features.

The rest of this section offers more detailed estimates of the time required to design and implement sufficient defenses for different operating contexts and threat models, drawing on conversations with chip industry experts.<sup>84</sup>

For minimally adversarial contexts the current level of hardware security likely would be sufficient, and thus many mechanisms could be implemented as firmware updates. This would take a few months. Some mechanisms may not be possible to implement this way on current hardware, in which case silicon-level changes would be required, and the time to implement them would increase to between 18 months and 4 years.<sup>85</sup>

For covertly adversarial contexts, a minimal solution likely could be deployed in a few months, using a combination of firmware changes and post hoc tamper-evident measures, such as adding tamper-evident seals to server cases. However, given the mixed track record of similar existing efforts, it is likely that the implementations of security features such as secure boot found on existing AI chips include "unpatchable" vulnerabilities that a well-resourced adversary could find. Therefore, a solution like this should not be considered fully trustworthy but may be acceptable to deploy in cases where there is sufficient monitoring, and sufficient capability and willingness to deter evasion attempts through legal means.

Designing and thoroughly testing a highly secure basic security module likely would take at least a year, and it would need to be finished at least a year before the chip enters the market. Thorough external testing of the finished product could add at least a year to this. Ideally, this would be combined with custom tamper-evident packaging and protections against side channel and fault injection attacks. Developing and scaling up the production of novel physical protections could take years but could be done concurrently with the development of the security module.

For openly adversarial contexts, an extremely well-secured security module would be a necessity, due to having little ability to deter hacking. Additionally, some kind of tamper-proof envelope would be required. Developing and producing such tamper-proofing features likely would take several years, due to the unsuitability of existing solutions, and the need to prototype and physically test novel physical mechanisms, and then scale up their production.

## 6 Recommendations

On-chip governance mechanisms present a promising area for further research for computer engineers, computer scientists, and policy researchers. This submission offers the following recommendations to move toward a world where all leading AI chips are both secure and governable. Many of these cannot be implemented by BIS alone, but the authors hope that BIS can play a key coordinating role with other relevant offices and agencies, where the recommendations align with BIS' strategy.

\_

<sup>&</sup>lt;sup>84</sup> Discussions with chip industry experts (including at NVIDIA), 2023.

<sup>&</sup>lt;sup>85</sup> These numbers are based on estimates from current and former employees at major chip companies.

## Establish government coordination

The White House should issue an executive order establishing a NIST-led interagency working group, focused on getting on-chip governance mechanisms built into all export-controlled data center AI chips.

For on-chip governance to reach commercial scale, long-term collaboration between government and industry will be required. For progress to be made on the time scale required, an executive order is an appropriate forcing function. An executive order also could include other important initiatives to secure the AI supply chain, such as cross-agency coordination to tackle AI chip smuggling and better track other critical inputs to AI.<sup>86</sup>

The National Institute of Standards and Technology (NIST) would make a suitable lead for this effort. Relevant existing NIST initiatives include the CHIPS Program Office, and the Cryptographic Module Validation and Hardware-Based Confidential Computing programs.<sup>87</sup> Expertise and staff also should be drawn from the following agencies and offices:

- The Department of Energy (Sandia National Lab)
- The Department of Commerce (Bureau of Industry and Security and the Office for Policy and Strategic Planning)
- The Department of Defense (DARPA and microelectronics-focused groups)
- The Department of Homeland Security (Cybersecurity & Infrastructure Security Agency)
- The U.S. intelligence community (National Security Agency)
- The National Science Foundation (Center for Hardware and Embedded System Security and Trust).

While the implementation of on-chip governance mechanisms efforts can be broken down further into distinct policy and technical efforts, central oversight and steering will help:

- Ground policy development and implementation in technical findings and efforts, and conversely, target technical efforts toward addressing policy issues seen as most compelling
- Account for synergies and dependencies within different areas of effort (for example, ensuring tampering countermeasures are applicable to the most promising security module implementations)
- Provide a single point of contact for industry.

This program should be informed by a technical panel drawn from industry, academia, and government to evaluate feasibility and challenges (including those around cost and time frames) for technical work toward the implementation of on-chip governance mechanisms. This panel likely will need to draw on both unclassified and classified information (for example, through classified meetings and reporting annexes) to benefit fully from both nongovernment academic and industry expertise and knowledge around the state-of-the-art for secure computing hardware, and relevant offensive capabilities, as held by national laboratories and the intelligence community.

#### Create commercial incentives

The Department of Commerce (DoC) should incentivize U.S. chip designers to conduct necessary R&D using "advance export market commitments."

https://www.cnas.org/publications/reports/preventing-ai-chip-smuggling-to-china.

<sup>86 &</sup>quot;Preventing AI Chip Smuggling to China,"

<sup>&</sup>lt;sup>87</sup> Computer Security Division, Information Technology Laboratory, "Cryptographic Module Validation Program | CSRC | CSRC | NIST, October 11, 2016,

https://csrc.nist.gov/Projects/Cryptographic-Module-Validation-Program; Michael Bartock, Murugiah Souppaya, Jerry Wheeler, Timothy Knoll, Muthukkumaran Ramalingam, and Stefano Righi. "Hardware-Enabled Security: Hardware-Based Confidential Computing." National Institute of Standards and Technology, February 23, 2023. <a href="https://doi.org/10.6028/NIST.IR.8320D.ipd">https://doi.org/10.6028/NIST.IR.8320D.ipd</a>.

Given that on-chip governance mechanisms need to be implemented on commercial chips, much of the necessary R&D will need to happen in an industry setting. "Advance market commitments" are contracts offered by a government to guarantee a viable market for a product once it has been successfully developed.88 BIS has already suggested they could except certain chips from export controls if they meet a set of (yet to be defined) technical requirements.<sup>89</sup> They should now make this explicit by using advance market commitments that guarantee export market access, conditional on firms provably implementing a specific set of security features on their data center AI chips.

Export market commitments could include not extending export controls to new jurisdictions, relaxing the "presumption of denial" licensing policy for chip exports to lower-risk customers in China, or moving toward more surgical end-use or end-user-based controls. These commitments could be an effective way of incentivizing development without spending public money: NVIDIA has estimated lost revenue of up to \$400 million in Q4 2022 as a result of existing controls. This figure is likely much higher today, given NVIDIA's data center revenue has more than doubled.91

A key challenge is ensuring that technical requirements are adequately defined. Different tiers of requirements could be appropriate for different export geographies. The DoC should develop these requirements by analyzing specific attacker threat models in different export contexts, drawing on expertise from the National Security Agency and Cybersecurity & Infrastructure Security Agency.

## Accelerate security R&D

NIST should coordinate with industry and relevant government funding bodies to fund and support hardware security R&D that can be conducted outside leading chip companies and integrated later.

While the bulk of R&D for on-chip governance will need to be conducted by the firms building and selling AI chips at scale, some work may be conducted usefully outside of these firms, especially technologies that would benefit from being standardized across the industry. NIST (and the CHIPS Program Office within NIST) should coordinate with the Semiconductor Research Corporation, DARPA, and other relevant government funding bodies to fund useful R&D performed by academic and/or commercial partners.92

For example, R&D on specialized tamper-proof enclosures (physical housings for chips that prevent the chip from being modified without compromising its operation) for high-end chips could be potentially (partly) outsourced to academic and commercial hardware security labs. There are many precedents for this: The DARPA-supported Morello program and NIST-led Supply Chain Assurance project are examples of programs in the hardware security space that include academic and/or commercial partners.<sup>93</sup>

<sup>88 &</sup>quot;Creating Advanced Market Commitments and Prizes for Pandemic Preparedness," https://fas.org/publication/creating-advanced-market-commitments-and-prizes-for-pandemic-preparedness/.

<sup>89 &</sup>quot;Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use; Updates and Corrections.", Supplementary information section D.2, 88 Fed. Reg. 73458, October 25, 2023, https://www.federalregister.gov/d/2023-23055/p-350.

<sup>90</sup> Stephen Nellis, Jane Lee, and Jane Lee, "U.S. Officials Order Nvidia to Halt Sales of Top AI Chips to China," Reuters, September 1, 2022, sec. Technology,

https://www.reuters.com/technology/nvidia-says-us-has-imposed-new-license-requirement-future-exports-china-20

<sup>22-08-31/.
91</sup> NVIDIA Newsroom. "NVIDIA Announces Financial Results for Second Quarter Fiscal 2024," http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-second-quarter-fiscal-2024. <sup>92</sup> The Semiconductor Research Corporation funds academic and public-private, and lists several hardware security-related projects in its 2023 call for research. "Semiconductor Research Corporation—SRC." https://www.src.org/. The DARPA Microsystems Technology Office also supports a range of projects related to hardware security: "Microsystems Technology Office (MTO)," https://www.darpa.mil/about-us/offices/mto. 93 "Department of Computer Science and Technology - CHERI: The Arm Morello Board," https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-morello.html; "Supply Chain Assurance | NCCoE," https://www.nccoe.nist.gov/supply-chain-assurance.

One promising set of commercial partners are firms that develop "ruggedized" AI servers for national security or other sensitive applications. Such firms typically offer products that incorporate leading AI chips in form factors optimized for challenging environments.<sup>94</sup>

To support these projects, NIST could expand on its work on Hardware-Enabled Security to create technical standards and reference implementations for on-chip governance mechanisms that are designed for wide adoption by industry. <sup>95</sup>

## Plan for a staged roll-out and fund extensive red-teaming

To ensure that on-chip governance mechanisms are properly designed and safely introduced, the Department of Commerce and Department of Homeland Security (DHS) should establish flexible export licensing and red-teaming programs.

On-chip mechanisms will require substantial testing before being relied on in more adversarial environments, such as exports of controlled chips to the PRC. To facilitate a staged rollout approach where mechanisms can be depended upon in successively more challenging operating contexts, BIS should create export licensing arrangements where licenses can be flexibly granted for different geographies based on the security features on the device to be exported. This would allow BIS to test the utility of different hardware-based mechanisms for export control enforcement and develop robust technical standards, and it also would allow chip firms to receive feedback from their customers to improve their designs. Theoretically, this process could begin immediately with firmware updates to currently controlled chips.

In tandem, the Cybersecurity and Infrastructure Security Agency (CISA, within the Department of Homeland Security) should establish red-teaming and bug bounty programs to help find and patch any software and hardware security vulnerabilities in AI hardware. These programs could fit within CISA's "Secure by Design" program. They also would benefit from technical expertise and input from DARPA, which has run similar exercises as part of the System Security Integration Through Hardware and Firmware (SSITH) program. <sup>96</sup> A promising near-term starting point is setting up a public prize for finding vulnerabilities in hardware security features on today's AI chips.

## Explore ways to address smuggling

BIS should solicit proposals from industry for anti-smuggling mechanisms that could be introduced on exported chips.

Smuggling is likely to become an increasingly significant problem. On-chip governance mechanisms, such as the kind of a location verification mechanism discussed in section 2.2 could be very valuable for detecting, and thus deterring, smuggling. Indeed, on-chip governance mechanisms may be particularly suited to deterring smuggling, as smugglers may be relatively less technically sophisticated, and thus less likely to be able to circumvent the mechanisms.

Requiring or incentivizing the inclusion of such anti-smuggling mechanisms may be difficult using BIS's current authorities, but there are several possible paths to overcoming this issue, see Appendix C of further discussion.

<sup>&</sup>lt;sup>94</sup> See, for example: "GPU Cards | Curtiss-Wright Defense Solutions," https://www.curtisswrightds.com/products/computing/gpu; "Rugged Servers and Subsystems," https://www.mrcv.com/products/rugged-servers-and-subsystems.

<sup>&</sup>lt;sup>95</sup> Michael Bartock, Murugiah Souppaya, Jerry Wheeler, Timothy Knoll, Muthukkumaran Ramalingam, and Stefano Righi, "Hardware-Enabled Security: Hardware-Based Confidential Computing," National Institute of Standards and Technology, February 23, 2023, <a href="https://doi.org/10.6028/NIST.IR.8320D.ipd">https://doi.org/10.6028/NIST.IR.8320D.ipd</a>.

<sup>&</sup>lt;sup>96</sup> "DARPA Finding Exploits to Thwart Tampering (FETT) Bug Bounty Capture-the-Flag Qualifier (Archived)," <a href="https://www.darpa.mil/news-events/darpa-finding-exploits-to-thwart-tampering">https://www.darpa.mil/news-events/darpa-finding-exploits-to-thwart-tampering</a>.

## Coordinate with allies

The State and Commerce Departments should coordinate with allies on policies and standards for on-chip governance.

U.S. chip suppliers such as NVIDIA currently dominate the supply of the most powerful logic chips, meaning that, conditional on successful implementation, the United States could realize many of the policy benefits from on-chip governance mechanisms through unilateral action. However, to mitigate risks to the potential effectiveness of an on-chip mechanism policy from advances in foreign chip design and production, the United States should seek buy-in and harmonization with countries occupying key chokepoints—particularly Taiwan, the Netherlands, South Korea, and Japan. <sup>97</sup> Looking beyond export control coordination, using on-chip governance mechanisms to facilitate AI governance cooperation (e.g., international agreements on compute usage reporting) would benefit from close coordination with like-minded allies such as the United Kingdom and the European Union. <sup>98</sup>

## Encourage AI chip firms to move early

Chip firms should move early to build and harden the security features required for on-chip governance.

If the U.S. government looks to realize the national security and governance benefits of on-chip governance mechanisms, chip suppliers that are more able to apply and build on existing technical efforts will have a head start on demonstrating and realizing compliance, with potential benefits in terms of access to markets that are the subject of export controls or other relevant regulation. Leading chip suppliers (as well as other industry players with relevant capabilities), should build on and harden existing security features toward enabling on-chip governance mechanisms.

## 7 Limitations and Conclusion

Much of this submission focuses on security, as it is the principal challenge for effectively implementing on-chip governance mechanisms. However, security is a difficult topic to assess. Ultimately, the applicability of on-chip governance mechanisms for many use cases depends on hard-to-assess factors such as well-resourced adversaries' capabilities for fully invasive physical attacks, or the ability of current AI chips to resist types of attacks to which they have never been subjected.

This submission's optimism about the feasibility of secure on-chip mechanisms is influenced significantly by the relative success of Apple's Secure Enclave Processor. The Processor is a relevant point of comparison since Apple devices are among the rare devices that frequently are "attacked" by their own users to circumvent built-in restrictions. However, this comparison is still far from perfect: These attackers are typically relatively poorly resourced, without very significant financial motive to succeed, and without budgets to buy expensive equipment for sophisticated tampering.

Though adequate security will represent a novel challenge, developing on-chip governance remains an urgent and important mission for addressing national security risks from AI and maintaining American technological leadership. Developing and deploying the mechanisms described in this submission will take time (months in the most optimistic case, and years in the most likely case). If the capabilities and national security risks of AI systems continue to grow at the pace observed in 2022 and 2023, the need for highly effective controls will become acute in several years. This suggests that policymakers concerned about this issue should begin formulating policies and incentivizing the development of appropriate technologies

<sup>&</sup>lt;sup>97</sup> Center for Security and Emerging Technology and Saif M. Khan, "Securing Semiconductor Supply Chains," Center for Security and Emerging Technology, January 2021, <a href="https://doi.org/10.51593/20190017">https://doi.org/10.51593/20190017</a>; Gregory C. Allen, "Choking off China's Access to the Future of AI," Center for Strategic & International Studies, October 11, 2022, <a href="https://www.csis.org/analysis/choking-chinas-access-future-ai.">https://www.csis.org/analysis/choking-chinas-access-future-ai.</a>

<sup>&</sup>lt;sup>98</sup> For an overview of how such cooperation could fit into a broader transatlantic technology strategy, see: "Lighting the Path," <a href="https://www.cnas.org/publications/reports/lighting-the-path">https://www.cnas.org/publications/reports/lighting-the-path</a>.

now. Once the relevant security features have been mandated in the most powerful AI chips, they need not be used immediately: The mechanisms outlined in this submission would allow for rapid and flexible responses to new developments and threats once installed. With ambition and coordination with industry and key allies, the United States can create a secure foundation for a more flexible and targeted form of AI governance to meet the challenges of the 21st century.

## Appendix A: Glossary for AI Compute

What follows is a brief overview of different technical concepts related to AI computing that are used in this submission.

## Different Types of AI Chips

"AI chip" refers to any chip that is designed for AI applications. 99 This submission primarily uses this term, but several related terms are used frequently:

- AI accelerator: In computing, accelerator generally refers to a processor or component that is specialized for some type of task, and thus accelerates performance on that task relative to only using a CPU. Thus "AI accelerator" is an umbrella term for chips, or modules on a chip, that are designed to improve performance in AI applications. The only difference between "AI chip" and "AI accelerator" is that an accelerator can be a module on a larger chip.
- GPU, graphics processing unit: As the name suggests, GPUs originally were designed for generating graphics, but they were discovered to be well-suited for deep learning, and have since evolved to be even better suited for AI applications. The most important producers of GPUs are currently NVIDIA and AMD, with NVIDIA having a much greater market share for AI applications. 100
- TPU, tensor processing unit: TPUs are a type of AI accelerator developed by Google, likely the most popular non-GPU AI chip, and notable for being used for many landmark AI results achieved by Google-affiliated organizations such as DeepMind.
- Other terms: Many smaller chip companies have coined new terms for their AI chips. For example, the British company Graphcore calls its chips "IPUs" (intelligence processing unit). 101

This submission focuses especially on NVIDIA GPUs, as:

- Large-scale AI training is performed overwhelmingly with NVIDIA GPUs or Google TPUs, with few exceptions. 102
- Because TPUs are operated only in Google's own data centers, Google could implement governance mechanisms to verify and restrict the use of compute at the cloud service layer. This would make many applications of on-chip governance mechanisms, such as export control enforcement, no longer applicable.

## Distinguishing Between AI Chips and Non-AI Chips

All the above assumes that there is a distinct set of "AI chips" that one might wish to regulate. Currently, AI chips are fairly specialized, but the most popular AI chips still have major non-AI uses, and some non-AI chips still provide decent performance for AI. In general, GPUs can be divided into data center GPUs, which are used typically for commercial purposes, and gaming GPUS, which are used typically by individual consumers for entertainment purposes.

The most commonly used chips for training large AI models are NVIDIA's data center GPUs. 103 These GPUs also are used for many other applications: somewhere between 10 percent and 50 percent of the

<sup>99</sup> Khan and Mann, "AI Chips."

<sup>100</sup> Nathan Benaich and Nathan Hogarth, "Compute Index," State of AI Report, June 2, 2023,

https://www.stateof.ai/compute; Dylan Patel, "How Nvidia's CUDA Monopoly In Machine Learning Is Breaking -OpenAI Triton And PyTorch 2.0," SemiAnalysis, January 16, 2023,

https://www.semianalysis.com/p/nvidiaopenaitritonpytorch.

<sup>101 &</sup>quot;IPU Processors," Graphcore, https://www.graphcore.ai/products/ipu.

<sup>102</sup> Epoch, "Parameter, Compute and Data Trends in Machine Learning,"

https://docs.google.com/spreadsheets/d/1AAIebjNsnJj\_uKALHbXNfn3\_YsT6sHXtCU0q7OIPuc4/.

<sup>&</sup>lt;sup>103</sup> Epoch, "Parameter, Compute and Data Trends in Machine Learning."

uses of NVIDIA data center chips are still non-AI.<sup>104</sup> Including all of these chips in a regulatory regime would likely have substantial costs.

At the time of writing, the most powerful *gaming* GPU is the NVIDIA RTX 4090, which is not as powerful as the last two generations of NVIDIA's AI-focused data center GPUs: the A100 and the H100. However, consumer gaming GPUs generally have better price-performance (cost per unit of performance) than top-of-the-line data center GPUs, due to their much lower price. This does not translate into better price-performance in large-scale training workloads, however, due to relative limitations in memory bandwidth and chip-to-chip interconnect bandwidth. Based on conversations with engineers at AI companies training large AI models, the authors expect that using gaming GPUs for large-scale AI training today would result in a significant, but not crippling, overall price-performance penalty, perhaps 2x.

Regulations related to AI chips would be more straightforward if the market were segmented more clearly into AI chips and non-AI chips. It likely would be valuable if compute vendors made more specific product differentiations. For example, NVIDIA removed support for its NVLink chip-to-chip interconnect protocol from its leading gaming GPUs. This reduced the usefulness of gaming GPUs for training powerful AI models, while having no effect on the vast majority of gamers who never would likely have used the feature. It might be possible to use on-chip mechanisms to strengthen this distinction. For example, GPUs intended primarily for actual graphics applications could be required to be equipped with mechanisms that limit their usefulness for AI in order to create this kind of market segmentation and allow these chips to be sold with fewer restrictions. <sup>106</sup>

However, an imperfect regime that regulates only a somewhat arbitrary set of the most powerful chips still could be useful. It would make the lives of those wishing to circumvent regulations at least somewhat more difficult and would allow suspicion to be targeted particularly at actors who go out of their way to use chips not included in the set of regulated chips.

## **Compute Clusters**

AI chips are combined into **compute clusters**. Compute clusters are interconnected computers that work collectively to perform complex tasks. They consist of diverse hardware components and a software stack. A software stack is a collection of software programs organized in multiple levels, where each level abstracts away technical detail from the layer below.

A compute cluster may be built and operated directly by an organization that wants to utilize the compute, such as a university or a corporation, on their own premises. This configuration is often called "on-premises", or on-prem for short. Alternatively, a compute cluster can reside in a **data center**, which is a facility dedicated to hosting computer hardware. Large data centers, especially those operated by cloud providers, can host multiple compute clusters.

Computing clusters contain several **nodes**, which effectively are individual computers. These are also known as **servers**. AI compute nodes have **CPUs** for basic functions, and specialized **AI chips**, like NVIDIA GPUs or Google TPUs, for AI-specific computations. These chips are supported by ample memory to store model weights. A relevant example of a very powerful single node would be NVIDIA's DGX<sup>107</sup> systems, each of which has eight NVIDIA A100 GPUs. A node also will have other components, such as drives for data storage.

-

<sup>&</sup>lt;sup>104</sup> Discussion with former NVIDIA employee, 2023.

<sup>&</sup>lt;sup>105</sup> Tim Dettmers, "Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning," January 30, 2023, <a href="https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/">https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/</a>.

<sup>&</sup>lt;sup>106</sup> More specifically, a rule like this might look like "any chips above a particular theoretical FLOP/s performance limit needs to either have an acceptable mechanism for limiting usefulness for AI training be registered and regulated as an AI chip."

<sup>107 &</sup>quot;NVIDIA DGX A100," NVIDIA, https://www.nvidia.com/en-us/data-center/dgx-a100/.

Training large AI models requires distributing the model across multiple AI chips and nodes, necessitating frequent synchronization of parameters. Traditionally, each node will have one or more **network interface cards** (NICs) that connect it to the cluster's network. These NICs will be connected to specialized components, known as **switches**, that route traffic between nodes. AI compute clusters typically use very high-end NICs and switches to enable extremely high bandwidth communication across AI chips in different nodes. Typically, AI chips within a node also are directly connected together with specialized hardware, such as NVIDIA's NVLink. NVIDIA also is developing a specialized switch, called the NVSwitch, that connects GPUs in different nodes to each other more directly, bypassing the conventional NIC.<sup>108</sup>

The above describes the most typical structure, but different compute vendors offer different alternatives. At one extreme, Cerebras designs massive chips that integrate all of the above into a single piece of silicon.<sup>109</sup>

# An Example of a Hardware, Firmware, and Software Stack for an AI Compute Cluster

The following "stack" of components make up a compute cluster. The most important concepts to understand for this submission are the firmware and the driver.

- 1. **Hardware**: This includes the physical components of the cluster, such as CPUs, AI chips (GPUs or TPUs), memory, network switches, and network interface cards.
- 2. **Firmware**: Firmware is the low-level software running on the hardware components, such as AI chips, switches, and network interfaces, managing their basic operations. Firmware typically is provided by the chip vendor and offers an interface between the hardware and higher-level software, including user-provided software.
- 3. **Operating system (OS)**: The OS manages resources and provides a platform for other software to run on. Examples include Linux distributions and Windows Server.
- 4. **Drivers**: Drivers enable communication between the OS and hardware components, such as AI chips and network interfaces.
- 5. **AI framework**: These frameworks simplify AI model development, training, and deployment. The most popular frameworks for deep learning are PyTorch and TensorFlow.
- 6. **Model distribution software**: Libraries and tools that help distribute the AI model across multiple nodes and chips, such as NVIDIA's NCCL (NVIDIA Collective Communications Library).
- 7. **Applications**: This is where the custom AI models, training scripts, and data processing pipelines reside, developed by researchers or engineers to solve specific problems.

## Appendix B: Additional Security Considerations

What follows is a detailed discussion of the securing software, firmware, hardware, and the supporting ecosystem for on-chip governance. This Appendix focused primarily on physical hardware security, given that aspect differs the most for on-chip governance compared to other security contexts.

## Securing Firmware and Software

Most on-chip governance mechanisms would rely on at least some firmware, and possibly software. Even if a secure boot mechanism has verified the "integrity" of firmware and software in the sense that it is the legitimate version, this does not mean that the legitimate version is free of vulnerabilities, and securing any substantial code against adversaries is notoriously difficult.

\_

<sup>108 &</sup>quot;NVLink & NVSwitch," NVIDIA, https://www.nvidia.com/en-us/data-center/nvlink/.

<sup>109 &</sup>quot;Product - System," Cerebras, <a href="https://www.cerebras.net/product-system/">https://www.cerebras.net/product-system/</a>.

Because attacks based on exploiting firmware and software vulnerabilities are relatively cheap, difficult to detect after the fact, and do not require physical access to the device, they should be considered in any threat model. For these reasons, they also also be assumed to be the first type of attack an adversary would attempt. Investing in other types of protections is only worthwhile if the firmware and software on a device are exceptionally secure.

It appears likely that a security module would be simple enough that it would be feasible to formally verify the correctness of all code running on the module. For example, most of the kernel code running on NVIDIA's "Peregrine" security module is formally verified, Apple's Secure Enclave Processor runs an Apple-customized version of the L4 microkernel, and a version of L4 has been formally verified.

However, fully formally verified code does not mean unhackable code. To date, developing complex software stacks that are fully secure against well-resourced adversaries has proved prohibitively difficult. Serious efforts have been made to secure software by means of testing and more advanced methods such as formal verification, but they have failed to produce bug-free its. For example, internal NVIDIA investigations into operating system-like software running on their GPUs (the kind of software where on-chip governance mechanisms would be implemented) found that, while formally verified code had significantly fewer bugs, several bugs still could be found per week of investigation, including some that were exploitable. In addition to the incredible complexity of modern software, the intractability of bug-free software is exacerbated by the complexity of the underlying compilers and hardware. This complex stack gives rise to interactions that are almost impossible to fully account for during the development process. In the complex is to interactions that are almost impossible to fully account for during the development process.

The saving grace of software (and firmware) is that it can be updated, and thus vulnerabilities can be fixed once found. However, this poses some difficulties in the case of on-chip governance mechanisms, as the user may not want to update their system. This can be addressed either by having the hardware enforce updates via expiring licenses, or by requiring users to regularly remotely attest to what firmware and software they are running, and imposing legal consequences on users whose systems are too far out of date. Thus, the most valuable measure to secure the software on a chip would be to implement extremely well-hardened hardware features for securely enforced updates and/or remote attestation.

Future advances in the capabilities of vulnerability-finding AI systems could impact the interplay between offense and defense significantly. On the one hand, if new systems provide significant new capabilities to attackers, this could reduce the time and expertise needed to undermine the software underlying on-chip

https://support.apple.com/guide/security/secure-enclave-sec59b0b31ff/web.

• There are flaws, e.g., logical errors in the definition of intended behavior.

- Vulnerabilities are introduced in the process of compiling the formally verified source into a binary.
- The model of the hardware that the verification system is working with is incomplete.

Zabrocki and Tereshkin, "Exploitation in the Era of Formal Verification." at 15:12.

Ideally, code for on-chip governance mechanisms also would account for risks from fault-injection attacks (where a chip is induced to misbehave by, for example, manipulation of its power supply, or electromagnetic pulses) (Chad Spensky et al., "Glitching Demystified: Analyzing Control-Flow-Based Glitching Attacks and Defenses," in 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2021, 400–412, <a href="https://doi.org/10.1109/DSN48987.2021.00051">https://doi.org/10.1109/DSN48987.2021.00051</a>). Formally verified languages such as SPARK can be extended to have relatively deep awareness of the hardware, and this can allow them to be used to avoid hardware-level bugs. But this requires substantial additional work, especially on custom hardware (Zabrocki and Tereshkin, "Exploitation in the Era of Formal Verification." at 13:48). On the other hand, exploiting such hardware-level bugs also requires the attacker to have a deep understanding of the hardware, so they are relatively difficult to exploit.

<sup>&</sup>lt;sup>110</sup> Marko Mitic, "Systematically Securing the RISC-V - Secure Foundation for Embedded Functionality," <a href="https://www.youtube.com/watch?v=17i1kfHvWNI">https://www.youtube.com/watch?v=17i1kfHvWNI</a>.

<sup>111 &</sup>quot;Secure Enclave," Apple Support, May 17, 2021,

<sup>&</sup>lt;sup>112</sup> Gerwin Klein et al., "seL4: Formal Verification of an OS Kernel," in *Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles* (ACM 22nd Symposium on Operating Systems Principles, ACM, 2009), 207–20, https://doi.org/10.1145/1629575.1629596.

<sup>&</sup>lt;sup>113</sup> Adam Zabrocki and Alex Tereshkin, "Exploitation in the Era of Formal Verification," https://www.youtube.com/watch?v=TcIaZ9LW1WE.

<sup>&</sup>lt;sup>114</sup> Formally verified software can have exploitable vulnerabilities if:

governance mechanisms. On the other hand, these systems also could be used by defenders to more thoroughly identify and remediate vulnerabilities before (and after) products are deployed; in the long run, this could trend toward a significant defensive advantage and make effective cyber defense much more feasible than before, especially if the defender has differential access to the most advanced AI systems.<sup>115</sup>

## Securing Hardware

The central obstacle to deploying on-chip governance mechanisms today is achieving adequate hardware security: making chips either tamper-evident or tamper-proof. This section provides an overview of the technical considerations for achieving either goal.

## Tamper-Evidence

Physical attacks, by definition, involve physically manipulating the system. This makes them much easier to detect. Methods as simple as keeping the devices under video monitoring could be sufficient. It is also possible to use various *tamper-evident* technologies to allow inspectors to detect physical manipulation after the fact. For example, a server housing AI chips could be held together by screws that are painted over with glitter nail polish and photographed. Later, inspectors could compare the nail polish on the screws to the photos, and check whether the flecks of glitter are in the same positions. Tamper-evident metal seals also have been used heavily by the International Atomic Energy Agency to detect whether nuclear materials have been accessed inappropriately. Publicly available evidence for the effectiveness of high-end tamper-evident techniques is limited. One report assessed 289 tamper-evident seals, including some used for safeguarding nuclear materials, and found that all could be defeated cheaply. On the other hand, the authors blamed this largely on the limited resources spent on developing better seals, and expressed optimism about the feasibility of developing much more effective seals, if reasonable resources were devoted to that goal. Many tamper-evident techniques already have been developed for dedicated hardware security chips in order for those chips to meet security level 2 and above as defined in the FIPS 140 standard.

More challenging would be providing evidence of fault injection attacks—semi-invasive attacks wherein a chip is induced to misbehave, for example by manipulating its power supply or exposing it to electromagnetic pulses. <sup>121</sup> But because this would involve exposing the chip to unusual stimuli and inducing unusual states, it may be feasible to design chips to be tamper-evident against such attacks through techniques like having a specific on-chip fuse blow if the power supply is manipulated.

<sup>&</sup>lt;sup>115</sup> Ben Garfinkel and Allan Dafoe, "How Does the Offense-Defense Balance Scale?" *Journal of Strategic Studies* 42, no. 6 (September 19, 2019): 736–63, <a href="https://doi.org/10.1080/01402390.2019.1631810">https://doi.org/10.1080/01402390.2019.1631810</a>; Andrew Lohn and Krystal Jackson, "Will AI Make Cyber Swords or Shields?" Center for Security and Emerging Technology, August 2022, <a href="https://doi.org/10.51593/2022CA002">https://doi.org/10.51593/2022CA002</a>.

<sup>&</sup>lt;sup>116</sup> Given the chips in question would be in data centers, very little useful information (besides evidence of physical attacks) would be leaked by a video feed of a server rack, so this seems unlikely to create privacy and intellectual property issues.

https://puri.sm/posts/anti-interdiction on The Librem 5 USA," *Purism*, July 20, 2022, https://puri.sm/posts/anti-interdiction-on-the-librem-5-usa/.

<sup>&</sup>lt;sup>118</sup> Alexander Enders, "Safeguarding the Future: IAEA Looks for Improved Solutions for Passive Loop Seals for Nuclear Verification," IAEA, July 1, 2020,

 $<sup>\</sup>frac{\text{https://www.iaea.org/newscenter/news/safeguarding-the-future-iaea-looks-for-improved-solutions-for-passive-loop-seals-for-nuclear-verification.}$ 

<sup>&</sup>lt;sup>119</sup> Roger G. Johnston, Anthony RE Garcia, and Adam N. Pacheco, "Efficacy of Tamper-Indicating Devices," *Journal of Homeland Security, April* 16 (2002). Unfortunately for the purposes of this submission, Johnston et al. did not disclose exactly which seals they tested or how their attacks worked. It therefore is difficult to say how concerning their findings truly are.

<sup>&</sup>lt;sup>120</sup> NIST, "Security Requirements for Cryptographic Modules," May 2001, tbl. 2, https://doi.org/10.6028/NIST.FIPS.140-2.

<sup>&</sup>lt;sup>121</sup> Jakub Breier and Xiaolu Hou, "How Practical Are Fault Injection Attacks, Really?," *IEEE Access* 10 (2022): 113122–30, https://doi.org/10.1109/ACCESS.2022.3217212.

Additionally, fault injection attacks can be potentially mitigated and detected through specific software measures.<sup>122</sup>

Certain tamper-evident measures also could require occasional inspections. In the export control context, these could be on-site or involve short-notice mail-in requirements. While inspecting all chips presumably would involve unacceptable overhead, inspections of a small number of random and/or risk-based inspections should be sufficient to achieve statistical confidence that large-scale tampering of chips is not occurring. Such a program could be implemented at a fairly low cost compared to the existing budget of the Bureau of Industry and Security, but likely would require additional funding beyond the Bureau's current budget to scale to global stocks of tens of millions of controlled AI chips. 124

## Remote Tamper-Evidence

Some hardware security features even could provide remote tamper-evidence: compute operators could be required to regularly remotely attest to the integrity of their chips. Secure boot and remote attestation provide some degree of remote tamper-evidence, in that these tools can reveal if a chip is not running legitimate firmware, or if the configuration is not as expected. However, this method may not be sufficient if the chip itself has been physically tampered with, as the attacker also could compromise the remote attestation mechanism. There is ongoing research into developing protective enclosures for chips that could act as a physical unclonable function (PUF), and thus allow a chip to attest remotely to the integrity of the enclosure. Techniques such as "probe signal injection" also could be used, where a physical device profile first is defined by injecting an electromagnetic signal to elicit a "signature," and then the device is tested periodically to check if its physical signature has changed. For each of these technologies, it might be possible to extend this technology to remotely attest to the integrity of an entire server.

## Tamper-Proofing

To *prevent* physical attacks on a chip, the chip needs to have tamper-proof packaging.<sup>127</sup> This means packaging with (a) some means of detecting that it has been disturbed, and (b) the ability to take a destructive response when a disturbance is detected. Different types of responses are required in different cases. When the goal is to protect a private key, the response is simple and easy to implement: wipe the private key. This is typically called "zeroization". When protecting the core functionality of the chip, the response would be ideally to trigger some self-destruct mechanism, to destroy the core functionality that the attacker is trying to access.<sup>128</sup>

https://www.cnas.org/publications/reports/preventing-ai-chip-smuggling-to-china.

<sup>&</sup>lt;sup>122</sup> Spensky et al., "Glitching Demystified." Even detection of fault injection attacks with moderate probability per chip would be sufficient to achieve statistical confidence that large-scale efforts will be caught if sufficient numbers of chips are inspected.

<sup>123</sup> Shavit, "What Does It Take to Catch a Chinchilla?"

<sup>124 &</sup>quot;Preventing AI Chip Smuggling to China,"

<sup>&</sup>lt;sup>125</sup> Vincent Immler et al., "Secure Physical Enclosures from Covers with Tamper-Resistance," *IACR Transactions on Cryptographic Hardware and Embedded Systems*, 2019, 51–96, <a href="https://doi.org/10.13154/tches.v2019.i1.51-96">https://doi.org/10.13154/tches.v2019.i1.51-96</a>. PUFs are objects that rely on their unique physical characteristics to produce particular responses for particular inputs. These characteristics can be designed to degrade if tampered with. In this case, having tested a range of input-output pairs before a PUF is sold, a verifier can then confirm whether a PUF has been tampered with by seeing if a PUF still is generating the appropriate output for a given input.

<sup>&</sup>lt;sup>126</sup> Carlos Moreno, Sebastian Fischmeister, and Philippe Vibien, "A method and apparatus for detection of counterfeit parts, compromised or tampered components or devices, tampered systems such as local communication networks, and for secure identification of components," World Intellectual Property Organization WO2021056101A1, filed September 23, 2020, and issued April 1, 2021, <a href="https://patents.google.com/patent/WO2021056101A1/en">https://patents.google.com/patent/WO2021056101A1/en</a>.

<sup>&</sup>lt;sup>127</sup> Some security experts may object to the term "tamper-proof," preferring "tamper-resistant" as a more realistically achievable term. While it is true that "tamper-proof" is used usually in misleading ways, its usage in this submission is intended to convey a higher standard than what "tamper-resistant" usually evokes.

<sup>&</sup>lt;sup>128</sup> Destroying the core functionality is ideal for two reasons: Firstly, once the chip knows it is in the hands of an adversary that is actively attempting to tamper with the chip, it is safest to simply destroy the chip rather than allow the attacker more opportunities to circumvent or disable the anti-tampering functionality. Secondly and more

The detection problem is similar to the "tamper-evidence" problem. Tamper-detecting envelopes often are used in high-grade hardware security modules; they are a requirement for the highest level of security defined in the FIPS 140-2 standard for cryptographic modules. Tamper-detection usually is implemented using an envelope with current running through it, designed in such a way that its electrical properties would change if the envelope were broken. This change in electrical properties can be detected from inside the envelope, and a tamper response can be initiated. Such solutions appear to be technically feasible, but existing solutions are too bulky to be used for AI chips, as the enclosure would interfere with cooling. However, this problem is likely solvable. Several mature technologies then could be used to implement simple self-destruct mechanisms cheaply, which the envelope could trigger upon detection of a tampering attempt.

The most sophisticated hardware security modules appear to be very difficult to attack,<sup>131</sup> and there are no publicly known cases in which they have been physically compromised<sup>132</sup>. However, this evidence is unfortunately weak. These are niche products, almost always stored such that many other layers of defense would have had to fail for an attacker even to attempt tampering. Due to the contexts in which these devices are used, it also is likely that, even if a successful attack had occurred, the information would be classified or otherwise non-public.

Turning to self-destruct mechanisms, these are rare on commercially available chips, but such mechanisms should be relatively feasible to develop. Mature technologies exist, such as eFuses, that irreversibly modify the behavior of chips if triggered. Beyond fuses, other possible approaches include using excess voltage to deliberately damage the chip, or even extremely low-yield explosives.

To ensure that these protective measures cannot be disabled by cutting off power to the chip or removing it from the circuit board, the chip additionally needs to have a battery. The battery should be included in the tamper-proof packaging and should be able to provide sufficient power to keep the tamper-detection system active and power the zeroization or self-destruct mechanism for the duration of the life of the chip. <sup>133</sup> The chip must be programmed correspondingly to trigger the zeroization or self-destruct if that battery is about to run out.

## Securing the Supporting Ecosystem

In addition to targeting on-chip governance mechanisms themselves, attackers could target the systems of relevant controllers and verifiers. This may seem like a significant issue in that major companies and other organizations are quite frequently successfully attacked. For example, NVIDIA was compromised by a group of hackers in 2022, and some source code and design documents were stolen. <sup>134</sup> However, to truly compromise a well-designed on-chip governance mechanism, attackers would need to steal specific

generally: The strongest response to tampering efforts creates the strongest deterrent. However, if the tamper-detection mechanism in use is very sensitive and produces false positives at non-trivial rates, it likely would be preferable for the chip to lock itself until it receives some unusually strong form of re-authorization. The overseer could make this re-authorization conditional on, e.g., a physical inspection of the facility to ensure no foul play. 

129 NIST, "Security Requirements for Cryptographic Modules," May 2001, tbl. 2, 

https://doi.org/10.6028/NIST.FIPS.140-2.

<sup>&</sup>lt;sup>130</sup> Obermaier and Immler, "The Past, Present, and Future of Physical Security Enclosures."

<sup>&</sup>lt;sup>131</sup> Johannes Obermaier and Vincent Immler, "The Past, Present, and Future of Physical Security Enclosures: From Battery-Backed Monitoring to PUF-Based Inherent Security and Beyond," *Journal of Hardware and Systems Security* 2, no. 4 (December 2018): 2–4, <a href="https://doi.org/10.1007/s41635-018-0045-2">https://doi.org/10.1007/s41635-018-0045-2</a>.

<sup>&</sup>lt;sup>132</sup> More specifically, no publicly known cases where a FIPS 140-2 level 4 device has been compromised. Vincent Immler (Assistant Professor of Electrical and Computer Engineering, Oregon State University), in discussion with the author, April 19, 2023.

<sup>&</sup>lt;sup>133</sup> For an overview of battery-backed solutions, see: Johannes Obermaier and Vincent Immler, "The Past, Present, and Future of Physical Security Enclosures: From Battery-Backed Monitoring to PUF-Based Inherent Security and Beyond," Journal of Hardware and Systems Security 2, no. 4 (December 2018): 2–4, <a href="https://doi.org/10.1007/s41635-018-0045-2">https://doi.org/10.1007/s41635-018-0045-2</a>.

<sup>&</sup>lt;sup>134</sup> Lily Hay Newman, "The Lapsus\$ Hacking Group Is Off to a Chaotic Start," *Wired*, March 15, 2022, <a href="https://www.wired.com/story/lapsus-hacking-group-extortion-nvidia-samsung/">https://www.wired.com/story/lapsus-hacking-group-extortion-nvidia-samsung/</a>.

private keys. Such keys are more feasible to protect effectively than, for example, design documents that need to be accessible to large numbers of employees. As an example, the public key infrastructure upon which the security of internet traffic largely relies is rarely compromised, despite substantial incentives to do so. Indeed, the authors are not aware of any cases in which the root private key of a root certificate authority has been stolen. Nonetheless, given their sensitivity, securing keys for on-chip governance mechanisms likely would merit particularly strong information security measures—for example, using threshold cryptography to split the storage of keys across multiple independent systems.

Another angle of attack on the supporting ecosystem is in manufacturing supply chains. To rely on on-chip governance mechanisms in sensitive operating contexts, regulators will need confidence that these mechanisms have not been compromised by untrusted firms or insider attacks during fabrication and packaging. While this area is outside the scope of this submission, the Department of Defense's "Trusted & Assured Microelectronics" program could provide a useful starting point for best practices.<sup>137</sup>

# Appendix C: Using On-Chip Mechanisms to Prevent Chip Smuggling

Most of this submission has focused on the possibility of using on-chip governance mechanisms as a way to make export controls more targeted, and thus allow at least some exports to China and other high-risk countries to continue. These mechanisms could also additionally be applied as an anti-smuggling measure, to support the enforcement of current export controls. This appendix will discuss in more detail how on-chip governance mechanisms could be used to this end, and what policy levers could bring them into use.

Strong commercial incentives to smuggle AI chips into China may emerge as the difference between the quality and quantity of AI chips that can be procured in China and outside China grows. Many of these smugglers may be technically unsophisticated opportunists, and thus may be deterred even by relatively simple technical countermeasures that more sophisticated actors could circumvent. This may make on-chip governance mechanisms even more promising for deterring smuggling than for controlling the actions of more sophisticated actors.

## The authors' recommendations are:

- BIS should consider requiring location verification mechanisms on chips exported to high diversion risk countries, and solicit proposals for such systems from industry.
- BIS should consider working with Congress to update export control legislation to create clearer
  authorities for BIS to require exporters to implement more thorough anti-smuggling measures,
  including on-chip governance mechanisms. These updates could potentially also give BIS the
  authority to fine exporters for failing to take sufficient measures to prevent their exports from
  being illegally re-exported.

https://arstechnica.com/information-technology/2017/11/evasive-code-signed-malware-flourished-before-stuxnet-and-still-does/).

 $\frac{\text{https://www.technologyreview.com/2012/10/09/183378/to-keep-passwords-safe-from-hackers-just-break-them-into-bits/; "Multi-Party Threshold Cryptography | CSRC," NIST, August 18, 2023,$ 

https://csrc.nist.gov/projects/threshold-cryptography.

<sup>&</sup>lt;sup>135</sup> The DigiNotar case is a possible example, but it appears that the root keys were never actually extracted. Rather, attackers were able to temporarily access DigiNotar's systems to generate unauthorized certificates (Fox-IT, "Black Tulip: Report of the Investigation into the DigiNotar Certificate Authority Breach," August 13, 2012.). However, compromises of lower level certificate authorities are not unheard of (Dan Goodin, "Stuxnet-Style Code Signing Is More Widespread than Anyone Thought," Ars Technica, November 3, 2017,

<sup>&</sup>lt;sup>136</sup> Tom Simonite, "To Keep Passwords Safe from Hackers, Just Break Them into Bits," MIT Technology Review, October 9, 2012,

<sup>&</sup>lt;sup>137</sup> "Trusted & Assured Microelectronics – DoD Research & Engineering, OUSD(R&E)," <a href="https://www.cto.mil/tam/">https://www.cto.mil/tam/</a>.

 BIS should consider updating their guidance and practices to more aggressively fine exporters for compliance failures, in order to motivate exporters to take more effective measures to prevent smuggling, potentially including technical measures such as on-chip mechanisms.

## How on-chip governance mechanisms could help address smuggling

Several different types of on-chip mechanisms could be used to deter smuggling.

#### **Location Monitoring**

Likely the most useful type of on-chip governance mechanism for deterring smuggling would be a location verification mechanism. Location verification is discussed in Section 2.2 above. For countering smuggling, chips could be configured to only operate if they can verify that they are close enough to some trusted landmark server in a country or region to which the device can be legally sold without a license. If this is judged too intrusive, the chips could only be encouraged / expected to verify their location; if large numbers of chips sold to a particular importer stop reporting their locations, the importer could be investigated.

For example, if chips were exported for use in Singapore, the chips could be expected to be able to respond to cryptographic challenges from a trusted verification server in Singapore within 5 ms. This would be practically attainable for chips in a datacenter in Singapore, but physically impossible for chips in China: The furthest round trip a signal could do at the speed of light in 5 ms is 750 km, which could not even reach the Southern tip of Vietnam.

Re-programmable security modules, such as Nvidia's Peregrine, could likely be used to implement this type of location verification in firmware. Setting up a network of trusted landmark servers in all major cities near China and Russia would likely be very cheap, relative to the value of the chips being monitored. The Institute for AI Policy and Strategy is conducting ongoing research to assess the costs of a system like this more precisely.

#### **Operating Licenses**

An operating license mechanism could potentially be required on chips that are exported to high diversion risk countries. This would allow the operating licenses to be revoked if it was later discovered that the chips had been smuggled. Of course this would be a relatively intrusive approach, and location restrictions would typically be preferable.

## **Policy Options**

There are several obstacles to implementing on-chip governance mechanisms as a useful anti-smuggling measure. In addition to the technical obstacles discussed in this report, there are also potential policy-related obstacles. In particular, for on-chip mechanisms to be useful for preventing smuggling, they would ideally be included on chips exported to third countries to which exports are currently not controlled, which may be difficult.

There are two primary questions at hand:

- Which authorities to use? Can this be achieved via BIS's existing authorities, or would new authorities be required?
- How to motivate chip companies to invest in this, without crippling exports? In practice this would require setting a requirement that exported chips need to have the relevant mechanisms starting in a particular future year. However, determining this year is difficult, and there is a risk of accidentally banning exports if the deadline is set too soon, or undermining the controls if the security standards are too low.

Below, we discuss several different approaches for tackling these problems.

#### Approach 1: Relying on existing BIS authorities

BIS could require on-chip mechanisms as a condition of export of AI chips. However, this would require placing controls on the export of these chips to any country, including U.S. allies. Such a move could be politically unpopular.

A more moderate option could be to require these mechanisms on chips exported to specific countries where smuggling risk is particularly high. However, this could still be politically costly, as it would require including countries such as Singapore, which account for a substantial share of chip exports. Additionally, relying on existing BIS authorities would likely require BIS to commit to implement a specific requirement at some point in the future, which may be difficult to do credibly.

A future commitment is likely necessary because the required mechanisms are not already in place, and therefore chip vendors would need to be given a grace period to implement the mechanisms. This means that BIS would need to implement a rule that requires these mechanisms starting from some future date. For more novel mechanisms that could be moderately expensive to develop and implement, this point may need to be some years in the future, in which case it may be difficult for BIS to credibly commit to following through on the rule, and chip companies may see lobbying against the rule as a cheaper alternative to implementing the mechanism. Fortunately, it may be possible to implement a location verification system relatively quickly and cheaply through firmware updates on existing chips. BIS should inquire chip design companies as to whether this would indeed be possible.

#### Approach 2: New legislation

An alternative approach would be for Congress to require these governance mechanisms to be present on controlled AI chips exported by U.S. companies, starting in a particular year in the future. The new law could also clarify BIS's authorities to direct exporters to use these mechanisms in various ways to prevent smuggling. This would then make it relatively easy for BIS to then require exporters to actually make use of these mechanisms for export control compliance.

Some mechanisms, such as coarse location monitoring, could potentially also be required on chips that are sold in the U.S., to prevent smuggling these chips directly out of the U.S.. However, this type of smuggling is likely already substantially more difficult than smuggling through intermediate countries.<sup>138</sup>

This approach still raises the question of implementation timelines: The law would need to be phrased such that the mechanisms are required starting from some future date. This date would need to be determined in consultation with industry and other technical experts. However, as the law would be requiring the implementation of a technology which does not yet quite exist, there would be a risk that companies would be unable to meet the timeline, and a ban on the sale of AI chips was accidentally enacted. If this scenario threatens to materialize, Congress can always pass an extension to the timeline. Hopefully the difficulty of passing such extensions would nonetheless be sufficient to motivate chip companies to do their best to meet the timeline.

## Approach 3: Motivating better compliance via more aggressive enforcement

In other industries, such as finance, it is common for companies to spend significant amounts of money and effort specifically on compliance with laws such as anti-money laundering regulations, <sup>139</sup> in part motivated by substantial fines levied for violations <sup>140</sup>. In comparison, exporters of controlled goods invest

<sup>140</sup> Jaclyn Jaeger, Dec 22, and 2020 11:14 Pm. "Report: Fines against Financial Institutions Hit \$10.4B in 2020." Compliance Week.

<sup>&</sup>lt;sup>138</sup> Erich Grunewald, and Michael Aird. "AI Chip Smuggling into China: Potential Paths, Quantities, and Countermeasures.", p. 68, Institute for AI Policy and Strategy, October 4, 2023. https://www.iaps.ai/research/ai-chip-smuggling-into-china.

<sup>&</sup>lt;sup>139</sup> Dan Willis, "Financial Crime Compliance Costs Skyrocket to \$206.1bn Globally." *FinTech Global* (blog), September 27, 2023.

https://fintech.global/2023/09/27/financial-crime-compliance-costs-skyrocket-to-206-1bn-globally/.

relatively little in compliance, but could potentially be motivated to do so by the threat of higher fines. This could motivate exporters to adopt a range of improved measures for preventing illegal re-exports or transfers. These measures could include on-chip governance mechanisms if such mechanisms are cost-effective, but would give exporters the freedom to choose the means that work best for them.

As a first step, BIS could set a higher standard for the due diligence that exporters need to conduct, and fine exporters more aggressively if they fail to meet these standards. Part of the problem also appears to be that the original exporter is not typically held liable if products that they have exported are then illegally re-exported. Therefore, a more ambitious solution could be new laws to the effect that exporters could always be subject to some fine if their exports eventually end up illegally re-exported or transferred, regardless of any nominal compliance measures. This would incentivize companies to take more meaningful actions to actually prevent diversion, rather than simply going through the motions of due diligence.

This general approach also gives companies a very natural incentive to deploy iteratively more sophisticated on-chip mechanisms as they can be developed, while also incentivizing companies to invest more in other compliance measures in the meantime.

 $\frac{https://www.complianceweek.com/surveys-and-benchmarking/report-fines-against-financial-institutions-hit-104b-in-2020/29869.article.$